- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- World academy of science, engineering and technology
- Page Range or eLocation-ID:
- 302 - 311
- Sponsoring Org:
- National Science Foundation
More Like this
Abstract Multiple imputation (MI) is a popular and well-established method for handling missing data in multivariate data sets, but its practicality for use in massive and complex data sets has been questioned. One such data set is the Panel Study of Income Dynamics (PSID), a longstanding and extensive survey of household income and wealth in the United States. Missing data for this survey are currently handled using traditional hot deck methods because of the simple implementation; however, the univariate hot deck results in large random wealth fluctuations. MI is effective but faced with operational challenges. We use a sequential regression/chained-equation approach, using the software IVEware, to multiply impute cross-sectional wealth data in the 2013 PSID, and compare analyses of the resulting imputed data with those from the current hot deck approach. Practical difficulties, such as non-normally distributed variables, skip patterns, categorical variables with many levels, and multicollinearity, are described together with our approaches to overcoming them. We evaluate the imputation quality and validity with internal diagnostics and external benchmarking data. MI produces improvements over the existing hot deck approach by helping preserve correlation structures, such as the associations between PSID wealth components and the relationships between the household net worthmore »
Fitting Ordinal Factor Analysis Models With Missing Data: A Comparison Between Pairwise Deletion and Multiple ImputationThis study compares two missing data procedures in the context of ordinal factor analysis models: pairwise deletion (PD; the default setting in Mplus) and multiple imputation (MI). We examine which procedure demonstrates parameter estimates and model fit indices closer to those of complete data. The performance of PD and MI are compared under a wide range of conditions, including number of response categories, sample size, percent of missingness, and degree of model misfit. Results indicate that both PD and MI yield parameter estimates similar to those from analysis of complete data under conditions where the data are missing completely at random (MCAR). When the data are missing at random (MAR), PD parameter estimates are shown to be severely biased across parameter combinations in the study. When the percentage of missingness is less than 50%, MI yields parameter estimates that are similar to results from complete data. However, the fit indices (i.e., χ 2 , RMSEA, and WRMR) yield estimates that suggested a worse fit than results observed in complete data. We recommend that applied researchers use MI when fitting ordinal factor models with missing data. We further recommend interpreting model fit based on the TLI and CFI incremental fit indices.
A Hybrid Approach for The Stratified Mark-Specific Proportional Hazards Model with Missing Covariates and Missing Marks, with Application to Vaccine Efficacy Trials
Deployment of the recently licensed tetravalent dengue vaccine based on a chimeric yellow fever virus, CYD-TDV, requires understanding of how the risk of dengue disease in vaccine recipients depends jointly on a host biomarker measured after vaccination (neutralization titre—neutralizing antibodies) and on a ‘mark’ feature of the dengue disease failure event (the amino acid sequence distance of the dengue virus to the dengue sequence represented in the vaccine). The CYD14 phase 3 trial of CYD-TDV measured neutralizing antibodies via case–cohort sampling and the mark in dengue disease failure events, with about a third missing marks. We addressed the question of interest by developing inferential procedures for the stratified mark-specific proportional hazards model with missing covariates and missing marks. Two hybrid approaches are investigated that leverage both augmented inverse probability weighting and nearest neighbourhood hot deck multiple imputation. The two approaches differ in how the imputed marks are pooled in estimation. Our investigation shows that nearest neighbourhood hot deck imputation can lead to biased estimation without properly selected neighbourhoods. Simulations show that the hybrid methods developed perform well with unbiased nearest neighbourhood hot deck imputations from proper neighbourhood selection. The new methods applied to CYD14 show that neutralizing antibody levelmore »
Missing data is inevitable in longitudinal clinical trials. Conventionally, the missing at random assumption is assumed to handle missingness, which however is unverifiable empirically. Thus, sensitivity analyses are critically important to assess the robustness of the study conclusions against untestable assumptions. Toward this end, regulatory agencies and the pharmaceutical industry use sensitivity models such as return-to-baseline, control-based, and washout imputation, following the ICH E9(R1) guidance. Multiple imputation is popular in sensitivity analyses; however, it may be inefficient and result in an unsatisfying interval estimation by Rubin’s combining rule. We propose distributional imputation in sensitivity analysis, which imputes each missing value by samples from its target imputation model given the observed data. Drawn on the idea of Monte Carlo integration, the distributional imputation estimator solves the mean estimating equations of the imputed dataset. It is fully efficient with theoretical guarantees. Moreover, we propose weighted bootstrap to obtain a consistent variance estimator, taking into account the variabilities due to model parameter estimation and target parameter estimation. The superiority of the distributional imputation framework is validated in the simulation study and an antidepressant longitudinal clinical trial.
Data preprocessing is an integral step prior to analyzing data in psychological science, with implications for its potentially guiding policy. This article reports how psychological researchers address data preprocessing or quality concerns, with a focus on aberrant responses and missing data in self-report measures. 240 articles were sampled from four journals: Psychological Science, Journal of Personality and Social Psychology, Developmental Psychology, and Abnormal Psychology from 2012 to 2018. Nearly half of the studies did not report any missing data treatment (111/240; 46.25%), and if they did, the most common approach was listwise deletion (71/240; 29.6%). Studies that remove data due to missingness removed, on average, 12% of the sample. Likewise, most studies do not report any aberrant responses (194/240; 80%), but if they did, they classified 4% of the sample as suspect. Most studies are either not transparent enough about their data preprocessing steps or may be leveraging suboptimal procedures. Recommendations can improve transparency and data quality.