skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Conjugate Modeling Approaches for Small Area Estimation with Heteroscedastic Structure
Abstract Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.  more » « less
Award ID(s):
1853096
PAR ID:
10483993
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of Survey Statistics and Methodology
ISSN:
2325-0984
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A small area typically refers to a subpopulation or domain of interest for which a reliable direct estimate, based only on the domain-specific sample, cannot be produced due to small sample size in the domain. While traditional small area methods and models are widely used nowadays, there have also been much work and interest in robust statistical inference for small area estimation (SAE). We survey this work and provide a comprehensive review here.We begin with a brief review of the traditional SAE methods. We then discuss SAEmethods that are developed under weaker assumptions and SAE methods that are robust in certain ways, such as in terms of outliers or model failure. Our discussion also includes topics such as nonparametric SAE methods, Bayesian approaches, model selection and diagnostics, and missing data. A brief review of software packages available for implementing robust SAE methods is also given. 
    more » « less
  2. Optimal designs minimize the number of experimental runs (samples) needed to accurately estimate model parameters, resulting in algorithms that, for instance, efficiently minimize parameter estimate variance. Governed by knowledge of past observations, adaptive approaches adjust sampling constraints online as model parameter estimates are refined, continually maximizing expected information gained or variance reduced. We apply adaptive Bayesian inference to estimate transition rates of Markov chains, a common class of models for stochastic processes in nature. Unlike most previous studies, our sequential Bayesian optimal design is updated with each observation and can be simply extended beyond two-state models to birth–death processes and multistate models. By iteratively finding the best time to obtain each sample, our adaptive algorithm maximally reduces variance, resulting in lower overall error in ground truth parameter estimates across a wide range of Markov chain parameterizations and conformations. 
    more » « less
  3. Abstract Data integration combining a probability sample with another nonprobability sample is an emerging area of research in survey sampling. We consider the case when the study variable of interest is measured only in the nonprobability sample, but comparable auxiliary information is available for both data sources. We consider mass imputation for the probability sample using the nonprobability data as the training set for imputation. The parametric mass imputation is sensitive to parametric model assumptions. To develop improved and robust methods, we consider nonparametric mass imputation for data integration. In particular, we consider kernel smoothing for a low-dimensional covariate and generalized additive models for a relatively high-dimensional covariate for imputation. Asymptotic theories and variance estimation are developed. Simulation studies and real applications show the benefits of our proposed methods over parametric counterparts. 
    more » « less
  4. Powell, Roger (Ed.)
    Abstract Island biogeography theory (IBT) explains and estimates large-scale ecological patterns among islands and isolated habitat patches. Specifically, IBT predicts that the number of species per habitat patch differs as a function of area and isolation as a result of local colonization and extinction. Accurate estimates of species richness are essential for testing predictions of IBT, but differences in detectability of species can lead to bias in empirical data. Hierarchical community models correct for imperfect detection by leveraging information from across the community to estimate species-specific occupancy and detection probabilities. Using the fragmented Ozark glades as our model system, we constructed a hierarchical community model to 1) estimate site-level and regional species richness of small mammals while correcting for detection error, and 2) determine environmental covariates driving occupancy. We sampled 16 glades in southwestern Missouri in summer 2016–2017 and quantified mammal community structure within the glade network. The detected species pool included eight species, and the model yielded a regional species estimate of 8.6 species, with a mean of 3.47 species per glade. Species richness increased with patch area but not isolation, and effects of patch shape varied between species in the community. 
    more » « less
  5. Abstract Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models. 
    more » « less