skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Semiparametric imputation using conditional Gaussian mixture models under item nonresponse
Abstract Imputation is a popular technique for handling item nonresponse. Parametric imputation is based on a parametric model for imputation and is not robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose a new semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the proposed mixture model, we assume a conditional Gaussian model for the study variable given the auxiliary variables, but the marginal distribution of the auxiliary variables is not necessarily Gaussian. The proposed mixture model is more flexible and achieves a better approximation than the Gaussian mixture models. The proposed method is applicable to high‐dimensional covariate problem by including a penalty function in the conditional log‐likelihood function. The proposed method is applied to the 2017 Korean Household Income and Expenditure Survey conducted by Statistics Korea.  more » « less
Award ID(s):
1733572 1931380
PAR ID:
10364514
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 227-237
Size(s):
p. 227-237
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The analysis of time series data with detection limits is challenging due to the high‐dimensional integral involved in the likelihood. Existing methods are either computationally demanding or rely on restrictive parametric distributional assumptions. We propose a semiparametric approach, where the temporal dependence is captured by parametric copula, while the marginal distribution is estimated non‐parametrically. Utilizing the properties of copulas, we develop a new copula‐based sequential sampling algorithm, which provides a convenient way to calculate the censored likelihood. Even without full parametric distributional assumptions, the proposed method still allows us to efficiently compute the conditional quantiles of the censored response at a future time point, and thus construct both point and interval predictions. We establish the asymptotic properties of the proposed pseudo maximum likelihood estimator, and demonstrate through simulation and the analysis of a water quality data that the proposed method is more flexible and leads to more accurate predictions than Gaussian‐based methods for non‐normal data.The Canadian Journal of Statistics47: 438–454; 2019 © 2019 Statistical Society of Canada 
    more » « less
  2. Abstract The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions. 
    more » « less
  3. Nonparametric model-assisted estimators have been proposed to improve estimates of finite population parameters. Flexible nonparametric models provide more reliable estimators when a parametric model is misspecified. In this article, we propose an information criterion to select appropriate auxiliary variables to use in an additive model-assisted method. We approximate the additive nonparametric components using polynomial splines and extend the Bayesian Information Criterion (BIC) for finite populations. By removing irrelevant auxiliary variables, our method reduces model complexity and decreases estimator variance. We establish that the proposed BIC is asymptotically consistent in selecting the important explanatory variables when the true model is additive without interactions, a result supported by our numerical study. Our proposed method is easier to implement and better justified theoretically than the existing method proposed in the literature. 
    more » « less
  4. Nonparametric model-assisted estimators have been proposed to improve estimates of finite population parameters. Flexible nonparametric models provide more reliable estimators when a parametric model is misspecified. In this article, we propose an information criterion to select appropriate auxiliary variables to use in an additive model-assisted method. We approximate the additive nonparametric components using polynomial splines and extend the Bayesian Information Criterion (BIC) for finite populations. By removing irrelevant auxiliary variables, our method reduces model complexity and decreases estimator variance. We establish that the proposed BIC is asymptotically consistent in selecting the important explanatory variables when the true model is additive without interactions, a result supported by our numerical study. Our proposed method is easier to implement and better justified theoretically than the existing method proposed in the literature. 
    more » « less
  5. Abstract Data integration combining a probability sample with another nonprobability sample is an emerging area of research in survey sampling. We consider the case when the study variable of interest is measured only in the nonprobability sample, but comparable auxiliary information is available for both data sources. We consider mass imputation for the probability sample using the nonprobability data as the training set for imputation. The parametric mass imputation is sensitive to parametric model assumptions. To develop improved and robust methods, we consider nonparametric mass imputation for data integration. In particular, we consider kernel smoothing for a low-dimensional covariate and generalized additive models for a relatively high-dimensional covariate for imputation. Asymptotic theories and variance estimation are developed. Simulation studies and real applications show the benefits of our proposed methods over parametric counterparts. 
    more » « less