skip to main content


Title: dynr.mi: An R Program for Multiple Imputation in Dynamic Modeling
Assessing several individuals intensively over time yields intensive longitudinal data (ILD). Even though ILD provide rich information, they also bring other data analytic challenges. One of these is the increased occurrence of missingness with increased study length, possibly under non-ignorable missingness scenarios. Multiple imputation (MI) handles missing data by creating several imputed data sets, and pooling the estimation results across imputed data sets to yield final estimates for inferential purposes. In this article, we introduce dynr.mi(), a function in the R package, Dynamic Modeling in R (dynr). The package dynr provides a suite of fast and accessible functions for estimating and visualizing the results from fitting linear and nonlinear dynamic systems models in discrete as well as continuous time. By integrating the estimation functions in dynr and the MI procedures available from the R package, Multivariate Imputation by Chained Equations (MICE), the dynr.mi() routine is designed to handle possibly non-ignorable missingness in the dependent variables and/or covariates in a user-specified dynamic systems model via MI, with convergence diagnostic check. We utilized dynr.mi() to examine, in the context of a vector autoregressive model, the relationships among individuals’ ambulatory physiological measures, and self-report affect valence and arousal. The results from MI were compared to those from listwise deletion of entries with missingness in the covariates. When we determined the number of iterations based on the convergence diagnostics available from dynr.mi(), differences in the statistical significance of the covariate parameters were observed between the listwise deletion and MI approaches. These results underscore the importance of considering diagnostic information in the implementation of MI procedures.  more » « less
Award ID(s):
1806874
PAR ID:
10101179
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
World academy of science, engineering and technology
Volume:
13
Issue:
5
ISSN:
1307-6892
Page Range / eLocation ID:
302 - 311
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A common challenge in developmental research is the amount of incomplete and missing data that occurs from respondents failing to complete tasks or questionnaires, as well as from disengaging from the study (i.e., attrition). This missingness can lead to biases in parameter estimates and, hence, in the interpretation of findings. These biases can be addressed through statistical techniques that adjust for missing data, such as multiple imputation. Although multiple imputation is highly effective, it has not been widely adopted by developmental scientists given barriers such as lack of training or misconceptions about imputation methods. Utilizing default methods within statistical software programs like listwise deletion is common but may introduce additional bias. This manuscript is intended to provide practical guidelines for developmental researchers to follow when examining their data for missingness, making decisions about how to handle that missingness and reporting the extent of missing data biases and specific multiple imputation procedures in publications. 
    more » « less
  2. Abstract

    Biobanks that collect deep phenotypic and genomic data across many individuals have emerged as a key resource in human genetics. However, phenotypes in biobanks are often missing across many individuals, limiting their utility. We propose AutoComplete, a deep learning-based imputation method to impute or ‘fill-in’ missing phenotypes in population-scale biobank datasets. When applied to collections of phenotypes measured across ~300,000 individuals from the UK Biobank, AutoComplete substantially improved imputation accuracy over existing methods. On three traits with notable amounts of missingness, we show that AutoComplete yields imputed phenotypes that are genetically similar to the originally observed phenotypes while increasing the effective sample size by about twofold on average. Further, genome-wide association analyses on the resulting imputed phenotypes led to a substantial increase in the number of associated loci. Our results demonstrate the utility of deep learning-based phenotype imputation to increase power for genetic discoveries in existing biobank datasets.

     
    more » « less
  3. This study compares two missing data procedures in the context of ordinal factor analysis models: pairwise deletion (PD; the default setting in Mplus) and multiple imputation (MI). We examine which procedure demonstrates parameter estimates and model fit indices closer to those of complete data. The performance of PD and MI are compared under a wide range of conditions, including number of response categories, sample size, percent of missingness, and degree of model misfit. Results indicate that both PD and MI yield parameter estimates similar to those from analysis of complete data under conditions where the data are missing completely at random (MCAR). When the data are missing at random (MAR), PD parameter estimates are shown to be severely biased across parameter combinations in the study. When the percentage of missingness is less than 50%, MI yields parameter estimates that are similar to results from complete data. However, the fit indices (i.e., χ2, RMSEA, and WRMR) yield estimates that suggested a worse fit than results observed in complete data. We recommend that applied researchers use MI when fitting ordinal factor models with missing data. We further recommend interpreting model fit based on the TLI and CFI incremental fit indices.

     
    more » « less
  4. Abstract

    Missing data is a prevalent problem in bioarchaeological research and imputation could provide a promising solution. This work simulated missingness on a control dataset (481 samples × 41 variables) in order to explore imputation methods for mixed data (qualitative and quantitative data). The tested methods included Random Forest (RF), PCA/MCA, factorial analysis for mixed data (FAMD), hotdeck, predictive mean matching (PMM), random samples from observed values (RSOV), and a multi-method (MM) approach for the three missingness mechanisms (MCAR, MAR, and MNAR) at levels of 5%, 10%, 20%, 30%, and 40% missingness. This study also compared single imputation with an adapted multiple imputation method derived from the R package “mice”. The results showed that the adapted multiple imputation technique always outperformed single imputation for the same method. The best performing methods were most often RF and MM, and other commonly successful methods were PCA/MCA and PMM multiple imputation. Across all criteria, the amount of missingness was the most important parameter for imputation accuracy. While this study found that some imputation methods performed better than others for the control dataset, each imputation method has advantages and disadvantages. Imputation remains a promising solution for datasets containing missingness; however when making a decision it is essential to consider dataset structure and research goals.

     
    more » « less
  5. Abstract Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worth and sociodemographic factors, and facilitates completed data analyses with general purposes. MI incorporates highly predictive covariates into imputation models and increases efficiency. We recommend the practical implementation of MI and expect greater gains when the fraction of missing information is large. 
    more » « less