skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leveraging Auxiliary Information on Marginal Distributions in Nonignorable Models for Item and Unit Nonresponse
Abstract Often, government agencies and survey organizations know the population counts or percentages for some of the variables in a survey. These may be available from auxiliary sources, for example administrative databases or other high-quality surveys. We present and illustrate a model-based framework for leveraging such auxiliary marginal information when handling unit and item nonresponse. We show how one can use the margins to specify different missingness mechanisms for each type of nonresponse. We use the framework to impute missing values in voter turnout in a subset of data from the US Current Population Survey. In doing so, we examine the sensitivity of results to different assumptions about the unit and item nonresponse.  more » « less
Award ID(s):
1733835
PAR ID:
10398621
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series A: Statistics in Society
Volume:
184
Issue:
2
ISSN:
0964-1998
Format(s):
Medium: X Size: p. 643-662
Size(s):
p. 643-662
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary We study a class of missingness mechanisms, referred to as sequentially additive nonignorable, for modelling multivariate data with item nonresponse. These mechanisms explicitly allow the probability of nonresponse for each variable to depend on the value of that variable, thereby representing nonignorable missingness mechanisms. These missing data models are identified by making use of auxiliary information on marginal distributions, such as marginal probabilities for multivariate categorical variables or moments for numeric variables. We prove identification results and illustrate the use of these mechanisms in an application. 
    more » « less
  2. ABSTRACT Probability surveys are challenged by increasing nonresponse rates, resulting in biased statistical inference. Auxiliary information about populations can be used to reduce bias in estimation. Often continuous auxiliary variables in administrative records are first discretized before releasing to the public to avoid confidentiality breaches. This may weaken the utility of the administrative records in improving survey estimates, particularly when there is a strong relationship between continuous auxiliary information and the survey outcome. In this paper, we propose a two‐step strategy, where the confidential continuous auxiliary data in the population are first utilized to estimate the response propensity score of the survey sample by statistical agencies, which is then included in a modified population data for data users. In the second step, data users who do not have access to confidential continuous auxiliary data conduct predictive survey inference by including discretized continuous variables and the propensity score as predictors using splines in a Bayesian model. We show by simulation that the proposed method performs well, yielding more efficient estimates of population means with 95% credible intervals providing better coverage than alternative approaches. We illustrate the proposed method using the Ohio Army National Guard Mental Health Initiative (OHARNG‐MHI). The methods developed in this work are readily available in the R packageAuxSurvey. 
    more » « less
  3. Abstract Imputation is a popular technique for handling item nonresponse. Parametric imputation is based on a parametric model for imputation and is not robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose a new semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the proposed mixture model, we assume a conditional Gaussian model for the study variable given the auxiliary variables, but the marginal distribution of the auxiliary variables is not necessarily Gaussian. The proposed mixture model is more flexible and achieves a better approximation than the Gaussian mixture models. The proposed method is applicable to high‐dimensional covariate problem by including a penalty function in the conditional log‐likelihood function. The proposed method is applied to the 2017 Korean Household Income and Expenditure Survey conducted by Statistics Korea. 
    more » « less
  4. Nearest neighbor imputation has a long tradition for handling item nonresponse in survey sampling. In this article, we study the asymptotic properties of the nearest neighbor imputation estimator for general population parameters, including population means, proportions and quantiles. For variance estimation, we propose novel replication variance estimation, which is asymptotically valid and straightforward to implement. The main idea is to construct replicates of the estimator directly based on its asymptotically linear terms, instead of individual records of variables. The simulation results show that nearest neighbor imputation and the proposed variance estimation provide valid inferences for general population parameters. 
    more » « less
  5. ObjectiveThe objective is to estimate the relative contributions of nonresponse, coverage, and measurement biases in survey estimates of voting. MethodsWe survey 3,000 Boston‐area households sampled from an address‐based frame matched, when possible, to telephone numbers. A two‐phase sampling design was used to follow up nonrespondents from phone interviews with personal interviews. All cases were then linked to voting records. ResultsNonresponse, coverage, and measurement‐biased survey estimates at varying stages of the study design. Coverage error linked to missing telephone numbers biased estimates that excluded nonphone households. Overall estimates including nonphone households and nonrespondent interviews include 25 percent relative bias equally attributable to measurement and nonresponse. ConclusionBias in voting measures is not limited to measurement bias. Researchers should also assess the potential for nonresponse and coverage biases. 
    more » « less