skip to main content

Title: Models for Small Area Estimation for Census Tracts

This study examines issues of Small Area Estimation that are raised by reliance on the American Community Survey (ACS), which reports tract‐level data based on much smaller samples than the decennial census long‐form that it replaced. We demonstrate the problem using a 100% transcription of microdata from the 1940 census. By drawing many samples from two major cities, we confirm a known pattern: random samples yield unbiased point estimates of means or proportions, but estimates based on smaller samples have larger average errors in measurement and greater risk of large error. Sampling variability also inflates estimates of measures of variation across areas (reflecting segregation or spatial inequality). This variation is at the heart of much contemporary spatial analysis. We then evaluate possible solutions. For point estimates, we examine three Bayesian models, all of which reduce sampling variation, and we encourage use of such models to correct ACS small area estimates. However, the corrected estimates cannot be used to calculate estimates of variation, because smoothing toward local or grand means artificially reduces variation. We note that there are potential Bayesian approaches to this problem, and we demonstrate an efficacious alternative that uses the original sample data.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
Geographical Analysis
Page Range / eLocation ID:
p. 325-350
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We develop a Bayesian model–based approach to finite population estimation accounting for spatial dependence. Our innovation here is a framework that achieves inference for finite population quantities in spatial process settings. A key distinction from the small area estimation setting is that we analyze finite populations referenced by their geographic coordinates. Specifically, we consider a two‐stage sampling design in which the primary units are geographic regions, the secondary units are point‐referenced locations, and the measured values are assumed to be a partial realization of a spatial process. Estimation of finite population quantities from geostatistical models does not account for sampling designs, which can impair inferential performance, whereas design‐based estimates ignore the spatial dependence in the finite population. We demonstrate by using simulation experiments that process‐based finite population sampling models improve model fit and inference over models that fail to account for spatial correlation. Furthermore, the process‐based models offer richer inference with spatially interpolated maps over the entire region. We reinforce these improvements and demonstrate scalable inference for groundwater nitrate levels in the population of California Central Valley wells by offering estimates of mean nitrate levels and their spatially interpolated maps.

    more » « less
  2. Empirical Bayesian analysis is a well‐known approach that incorporates an estimator into a Bayesian analysis. In this article, we offer another approach, which has several useful properties. Our solution is based on the framework introduced by Yekutieli (2012) to account for the variability introduced by selecting parameters. Specifically, we assume that the unknown parameter is contained within a ball centered at an estimator, and the radius is given by a prior distribution. We refer to our method as the auxiliary parameter constrained Bayesian hierarchical model (C‐BHM). This general framework is particularly exciting as traditional empirical Bayesian analysis and parametric Bayesian analysis can be written as special cases. Hence, this C‐BHM represents a unifying framework within the area of Bayesian statistics. Several technical results are provided. Furthermore, we show analytically that one can outperform both empirical and fully Bayesian analysis through the Bayes factor. We illustrate the C‐BHM to extend the Fay–Herriot model, which is often used in the survey sampling setting. To demonstrate the usefulness of our method we provide simulations and an illustration to data obtained from the U.S. Census Bureau's Small Area Income and Poverty Estimates (SAIPE) program.

    more » « less
  3. Abstract

    Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.

    more » « less
  4. Abstract

    Combining information from active and passive sampling of mobile animals is challenging because active‐sampling data are affected by limited detection of rare or sparse taxa, while passive‐sampling data reflect both density and movement. We propose that a model‐based analysis allows information to be combined between these methods to interpret variation in the relationship between active estimates of density and passive measurements of catch per unit effort to yield novel information on activity rates (distance/time). We illustrate where discrepancies arise between active and passive methods and demonstrate the model‐based approach with seasonal surveys of fish assemblages in the Florida Everglades, where data are derived from concurrent sampling with throw traps, an enclosure‐type sampler producing point estimates of density, and drift fences with unbaited minnow traps that measure catch per unit effort (CPUE). We compared incidence patterns generated by active and passive sampling, used hierarchical Bayesian modeling to quantify the detection ability of each method, characterized interspecific and seasonal variation in the relationship between density and passively measuredCPUE, and used a predator encounter‐rate model to convert variableCPUE–density relationships into ecological information on activity rates. Activity rate information was used to compare interspecific responses to seasonal hydrology and to quantify spatial variation in non‐native fish activity. Drift fences had higher detection probabilities for rare and sparse species than throw traps, causing discrepancies in the estimated spatial distribution of non‐native species from passively measuredCPUEand actively measured density. Detection probability of the passive sampler, but not the active sampler, varied seasonally with changes in water depth. The relationship betweenCPUEand density was sensitive to fluctuating depth, with most species not having a proportional relationship betweenCPUEand density until seasonal declines in depth. Activity rate estimates revealed interspecific differences in response to declining depths and identified locations and species with high rates of activity. We propose that variation in catchability from methods that passively measureCPUEcan be sources of ecological information on activity. We also suggest that model‐based combining of data types could be a productive approach for analyzing correspondence of incidence and abundance patterns in other applications.

    more » « less
  5. Kelp beds provide significant ecosystem services and socioeconomic benefits globally, and prominently in coastal zones of the California Current. Their distributions and abundance, however, vary greatly over space and time. Here, we describe long-term patterns of Giant Kelp (Macrocystis pyrifera) sea surface canopy area off the coast of San Diego County from 1983 through 2019 along with recent patterns of water column nitrate (NO3-) exposure inferred fromin situtemperature data in 2014 and 2015 at sites spanning 30 km of the coastline near San Diego California, USA. Site-specific patterns of kelp persistence and resilience were associated with ocean and climate dynamics, with total sea surface kelp canopy area varying approximately 33-fold over the almost 4 decades (min 0.34 km2in 1984; max 11.25 km2in 2008, median 4.79 km2). Site-normalized canopy areas showed that recent kelp persistence since 2014 was greater at Point Loma and La Jolla, the largest kelp beds off California, than at the much smaller kelp bed off Cardiff. NO3-exposure was estimated from an 11-month time series ofin situwater column temperature collected in 2014 and 2015 at 4 kelp beds, using a relationship between temperature and NO3-concentration previously established for the region. The vertical position of the 14.5°C isotherm, an indicator of the main thermocline and nutricline, varied across the entire water column at semidiurnal to seasonal frequencies. We use a novel means of quantifying estimated water column NO3-exposure integrated through time (mol-days m-2) adapted from degree days approaches commonly used to characterize thermal exposures. Water column integrated NO3-exposure binned by quarters of the time series showed strong seasonal differences with highest exposure in Mar - May 2015, lowest exposure in Sep - Dec 2014, with consistently highest exposure off Point Loma. The water column integrated NO3-signal was filtered to provide estimates of the contribution to total nitrate exposure from high frequency variability (ƒ >= 1 cycle 30 hr-1) associated predominantly with internal waves, and low frequency variability driven predominantly by seasonal upwelling. While seasonal upwelling accounted for > 90% of NO3-exposure across the full year, during warm periods when seasonal upwelling was reduced or absent and NO3-exposure was low overall, the proportion due to internal waves increased markedly to 84 to 100% of the site-specific total exposure. The high frequency variability associated with internal waves may supply critical nutrient availability during anomalously warm periods. Overall, these analyses support a hypothesis that differences in NO3-exposure among sites due to seasonal upwelling and higher frequency internal wave forcing contribute to spatial patterns in Giant Kelp persistence in southern California. The study period includes anomalously warm surface conditions and the marine heatwave associated with the “Pacific Warm Blob” superimposed on the seasonal thermal signal and corresponding to the onset of a multi-year decline in kelp canopy area and marked differences in kelp persistence among sites. Our analysis suggests that, particularly during periods of warm surface conditions, variation in NO3-exposure associated with processes occurring at higher frequencies, including internal waves can be a significant source of NO3-exposure to kelp beds in this region. The patterns described here also offer a view of the potential roles of seasonal and higher frequency nutrient dynamics for Giant Kelp persistence in southern California under continuing ocean surface warming and increasing frequency and intensity of marine heatwaves.

    more » « less