Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages.
- Award ID(s):
- 1829681
- NSF-PAR ID:
- 10125747
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- ISSN:
- 2640-3498
- Page Range / eLocation ID:
- 1762 - 1770
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This study compares two missing data procedures in the context of ordinal factor analysis models: pairwise deletion (PD; the default setting in Mplus) and multiple imputation (MI). We examine which procedure demonstrates parameter estimates and model fit indices closer to those of complete data. The performance of PD and MI are compared under a wide range of conditions, including number of response categories, sample size, percent of missingness, and degree of model misfit. Results indicate that both PD and MI yield parameter estimates similar to those from analysis of complete data under conditions where the data are missing completely at random (MCAR). When the data are missing at random (MAR), PD parameter estimates are shown to be severely biased across parameter combinations in the study. When the percentage of missingness is less than 50%, MI yields parameter estimates that are similar to results from complete data. However, the fit indices (i.e., χ2, RMSEA, and WRMR) yield estimates that suggested a worse fit than results observed in complete data. We recommend that applied researchers use MI when fitting ordinal factor models with missing data. We further recommend interpreting model fit based on the TLI and CFI incremental fit indices.
-
A probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present algorithms that systematically determine whether the joint probability distribution or a target marginal distribution is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network, known as m-graphs, that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The results significantly advance the existing work.more » « less
-
Implicit feedback is widely used in collaborative filtering methods for recommendation. It is well known that implicit feedback contains a large number of values that are missing not at random (MNAR); and the missing data is a mixture of negative and unknown feedback, making it difficult to learn users’ negative preferences. Recent studies modeled exposure, a latent missingness variable which indicates whether an item is exposed to a user, to give each missing entry a confidence of being negative feedback. However, these studies use static models and ignore the information in temporal dependencies among items, which seems to be an essential underlying factor to subsequent missingness. To model and exploit the dynamics of missingness, we propose a latent variable named “user intent” to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process. The resulting framework captures the dynamic item missingness and incorporate it into matrix factorization (MF) for recommendation. We also explore two types of constraints to achieve a more compact and interpretable representation of user intents. Experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommender systems.more » « less
-
Summary It is important to draw causal inference from observational studies, but this becomes challenging if the confounders have missing values. Generally, causal effects are not identifiable if the confounders are missing not at random. In this article we propose a novel framework for nonparametric identification of causal effects with confounders subject to an outcome-independent missingness, which means that the missing data mechanism is independent of the outcome, given the treatment and possibly missing confounders. We then propose a nonparametric two-stage least squares estimator and a parametric estimator for causal effects.more » « less