skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sequentially additive nonignorable missing data modelling using auxiliary marginal information
Summary We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value of that variable, thereby representing nonignorable missingness mechanisms. These missing data models are identified by making use of auxiliary information on marginal distributions, such as marginal probabilities for multivariate categorical variables or moments for numeric variables. We prove identification results and illustrate the use of these mechanisms in an application.  more » « less
Award ID(s):
1733835
PAR ID:
10332104
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Biometrika
Volume:
106
Issue:
4
ISSN:
0006-3444
Page Range / eLocation ID:
889 to 911
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example administrative databases or other high-quality surveys. We present and illustrate a model-based framework for leveraging such auxiliary marginal information when handling unit and item nonresponse. We show how one can use the margins to specify different missingness mechanisms for each type of nonresponse. We use the framework to impute missing values in voter turnout in a subset of data from the US Current Population Survey. In doing so, we examine the sensitivity of results to different assumptions about the unit and item nonresponse. 
    more » « less
  2. A probabilistic query may not be estimable from observed data corrupted by missing values if the data are not missing at random (MAR). It is therefore of theoretical interest and practical importance to determine in principle whether a probabilistic query is estimable from missing data or not when the data are not MAR. We present algorithms that systematically determine whether the joint probability distribution or a target marginal distribution is estimable from observed data with missing values, assuming that the data-generation model is represented as a Bayesian network, known as m-graphs, that not only encodes the dependencies among the variables but also explicitly portrays the mechanisms responsible for the missingness process. The results significantly advance the existing work. 
    more » « less
  3. The manuscript considers multivariate functional data analysis with a known graphical model among the functional variables representing their conditional relationships (e.g., brain region-level fMRI data with a prespecified connectivity graph among brain regions). Functional Gaussian graphical models (GGM) used for analyzing multivariate functional data customarily estimate an unknown graphical model, and cannot preserve knowledge of a given graph. We propose a method for multivariate functional analysis that exactly conforms to a given inter-variable graph. We first show the equivalence between partially separable functional GGM and graphical Gaussian processes (GP), proposed recently for constructing optimal multivariate covariance functions that retain a given graphical model. The theoretical connection helps to design a new algorithm that leverages Dempster’s covariance selection for obtaining the maximum likelihood estimate of the covariance function for multivariate functional data under graphical constraints. We also show that the finite term truncation of functional GGM basis expansion used in practice is equivalent to a low-rank graphical GP, which is known to oversmooth marginal distributions. To remedy this, we extend our algorithm to better preserve marginal distributions while respecting the graph and retaining computational scalability. The benefits of the proposed algorithms are illustrated using empirical experiments and a neuroimaging application. 
    more » « less
  4. Traditional methods for handling incomplete data, including Multiple Imputation and Maximum Likelihood, require that the data be Missing At Random (MAR). In most cases, however, missingness in a variable depends on the underlying value of that variable. In this work, we devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR). While previous work on MNAR data require variables to be discrete, we extend the analysis to continuous variables drawn from Gaussian distributions. We demonstrate the merits of our techniques by comparing it empirically to state of the art software packages. 
    more » « less
  5. Assessing several individuals intensively over time yields intensive longitudinal data (ILD). Even though ILD provide rich information, they also bring other data analytic challenges. One of these is the increased occurrence of missingness with increased study length, possibly under non-ignorable missingness scenarios. Multiple imputation (MI) handles missing data by creating several imputed data sets, and pooling the estimation results across imputed data sets to yield final estimates for inferential purposes. In this article, we introduce dynr.mi(), a function in the R package, Dynamic Modeling in R (dynr). The package dynr provides a suite of fast and accessible functions for estimating and visualizing the results from fitting linear and nonlinear dynamic systems models in discrete as well as continuous time. By integrating the estimation functions in dynr and the MI procedures available from the R package, Multivariate Imputation by Chained Equations (MICE), the dynr.mi() routine is designed to handle possibly non-ignorable missingness in the dependent variables and/or covariates in a user-specified dynamic systems model via MI, with convergence diagnostic check. We utilized dynr.mi() to examine, in the context of a vector autoregressive model, the relationships among individuals’ ambulatory physiological measures, and self-report affect valence and arousal. The results from MI were compared to those from listwise deletion of entries with missingness in the covariates. When we determined the number of iterations based on the convergence diagnostics available from dynr.mi(), differences in the statistical significance of the covariate parameters were observed between the listwise deletion and MI approaches. These results underscore the importance of considering diagnostic information in the implementation of MI procedures. 
    more » « less