skip to main content


Title: Bayesian hierarchical weighting adjustment and survey inference
We combine weighting and Bayesian prediction in a unified approach to survey inference. The general principles of Bayesian analysis imply that models for survey outcomes should be conditional on all variables that affect the probability of inclusion. We incorporate all the variables that are used in the weighting adjustment under the framework of multilevel regression and poststratification, as a byproduct generating model-based weights after smoothing. We improve small area estimation by dealing with different complex issues caused by real-life applications to obtain robust inference at finer levels for subdomains of interest. We investigate deep interactions and introduce structured prior distributions for smoothing and stability of estimates. The computation is done via Stan and is implemented in the open-source R package rstanarm and available for public use. We evaluate the design-based properties of the Bayesian procedure. Simulation studies illustrate how the model-based prediction and weighting inference can outperform classical weighting. We apply the method to the New York Longitudinal Study of Wellbeing. The new approach generates smoothed weights and increases efficiency for robust finite population inference, especially for subsets of the population.  more » « less
Award ID(s):
1926578
NSF-PAR ID:
10392346
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Survey methodology
Volume:
46
Issue:
2
ISSN:
0714-0045
Page Range / eLocation ID:
181-214
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    This paper deals with making inference on parameters of a two-level model matching the design hierarchy of a two-stage sample. In a pioneering paper, Scott and Smith (Journal of the American Statistical Association, 1969, 64, 830–840) proposed a Bayesian model based or prediction approach to estimating a finite population mean under two-stage cluster sampling. We provide a brief account of their pioneering work. We review two methods for the analysis of two-level models based on matching two-stage samples. Those methods are based on pseudo maximum likelihood and pseudo composite likelihood taking account of design weights. We then propose a new method for analysis of two-level models based on a normal approximation to the estimated cluster effects and taking account of design weights. This method does not require cluster sizes to be constants or unrelated to cluster effects. We evaluate the relative performance of the three methods in a simulation study. Finally, we apply the methods to real data obtained from 2011 Nepal Demographic and Health Survey (NDHS).

     
    more » « less
  2. Discovering governing physical laws from noisy data is a grand challenge in many science and engineering research areas. We present a new approach to data-driven discovery of ordinary differential equations (ODEs) and partial differential equations (PDEs), in explicit or implicit form. We demonstrate our approach on a wide range of problems, including shallow water equations and Navier–Stokes equations. The key idea is to select candidate terms for the underlying equations using dimensional analysis, and to approximate the weights of the terms with error bars using our threshold sparse Bayesian regression. This new algorithm employs Bayesian inference to tune the hyperparameters automatically. Our approach is effective, robust and able to quantify uncertainties by providing an error bar for each discovered candidate equation. The effectiveness of our algorithm is demonstrated through a collection of classical ODEs and PDEs. Numerical experiments demonstrate the robustness of our algorithm with respect to noisy data and its ability to discover various candidate equations with error bars that represent the quantified uncertainties. Detailed comparisons with the sequential threshold least-squares algorithm and the lasso algorithm are studied from noisy time-series measurements and indicate that the proposed method provides more robust and accurate results. In addition, the data-driven prediction of dynamics with error bars using discovered governing physical laws is more accurate and robust than classical polynomial regressions. 
    more » « less
  3. Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania. 
    more » « less
  4. Abstract

    Probability samples are the preferred method for providing inferences that are generalizable to a larger population. However, in many cases, this approach is unlikely to yield a sample size large enough to produce precise inferences. Our goal here is to improve the efficiency of inferences from a probability sample by combining (or blending) it with a nonprobability sample, which is (by itself) potentially fraught with selection biases that would compromise the generalizability of results. We develop novel methods of statistical weighting that may be used for this purpose. Specifically, we make a distinction between weights that can be used to make the two samples representative of the population individually (disjoint blending) and those that make only the combined sample representative (simultaneous blending). Our focus is on weights constructed using propensity scores, but consideration is also given to calibration weighting. We include simulation studies that, among other illustrations, show the gain in precision provided by the convenience sample is lower in circumstances where the outcome is strongly related to the auxiliary variables used to align the samples. Motivating the exposition is a survey of military caregivers; our interest is focused on unpaid caregivers of wounded, ill, or injured US servicemembers and veterans who served following September 11, 2001. Our work serves not only to illustrate the proper execution of blending but also to caution the reader with respect to its dangers, as invoking a nonprobability sample may not yield substantial improvements in precision when assumptions are valid and may induce biases in the event that they are not.

     
    more » « less
  5. Abstract

    Linear quantile regression is a powerful tool to investigate how predictors may affect a response heterogeneously across different quantile levels. Unfortunately, existing approaches find it extremely difficult to adjust for any dependency between observation units, largely because such methods are not based upon a fully generative model of the data. For analysing spatially indexed data, we address this difficulty by generalizing the joint quantile regression model of Yang and Tokdar (Journal of the American Statistical Association, 2017, 112(519), 1107–1120) and characterizing spatial dependence via a Gaussian or t-copula process on the underlying quantile levels of the observation units. A Bayesian semiparametric approach is introduced to perform inference of model parameters and carry out spatial quantile smoothing. An effective model comparison criteria is provided, particularly for selecting between different model specifications of tail heaviness and tail dependence. Extensive simulation studies and two real applications to particulate matter concentration and wildfire risk are presented to illustrate substantial gains in inference quality, prediction accuracy and uncertainty quantification over existing alternatives.

     
    more » « less