skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Nonlinear Fay-Herriot Models for Small Area Estimation Using Random Weight Neural Networks
Small area estimation models are critical for dissemination and understanding of important population characteristics within sub-domains that often have limited sample size. The classic Fay-Herriot model is perhaps the most widely used approach to generate such estimates. However, a limiting assumption of this approach is that the latent true population quantity has a linear relationship with the given covariates. Through the use of random weight neural networks, we develop a Bayesian hierarchical extension of this framework that allows for estimation of nonlinear relationships between the true population quantity and the covariates. We illustrate our approach through an empirical simulation study as well as an analysis of median household income for census tracts in the state of California.  more » « less
Award ID(s):
2215169
PAR ID:
10536888
Author(s) / Creator(s):
Publisher / Repository:
Journal of Official Statistics
Date Published:
Journal Name:
Journal of Official Statistics
Volume:
40
Issue:
2
ISSN:
0282-423X
Page Range / eLocation ID:
317 to 332
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In many applications of zero-inflated models, score tests are often used to evaluate whether the population heterogeneity as implied by these models is consistent with the data. The most frequently cited justification for using score tests is that they only require estimation under the null hypothesis. Because this estimation involves specifying a plausible model consistent with the null hypothesis, the testing procedure could lead to unreliable inferences under model misspecification. In this paper, we propose a score test of homogeneity for zero-inflated models that is robust against certain model misspecifications. Due to the true model being unknown in practical settings, our proposal is developed under a general framework of mixture models for which a layer of randomness is imposed on the model to account for uncertainty in the model specification. We exemplify this approach on the class of zero-inflated Poisson models, where a random term is imposed on the Poisson mean to adjust for relevant covariates missing from the mean model or a misspecified functional form. For this example, we show through simulations that the resulting score test of zero inflation maintains its empirical size at all levels, albeit a loss of power for the well-specified non-random mean model under the null. Frequencies of health promotion activities among young Girl Scouts and dental caries indices among inner-city children are used to illustrate the robustness of the proposed testing procedure. 
    more » « less
  2. Summary Randomized experiments have been the gold standard for drawing causal inference. The conventional model-based approach has been one of the most popular methods of analysing treatment effects from randomized experiments, which is often carried out through inference for certain model parameters. In this paper, we provide a systematic investigation of model-based analyses for treatment effects under the randomization-based inference framework. This framework does not impose any distributional assumptions on the outcomes, covariates and their dependence, and utilizes only randomization as the reasoned basis. We first derive the asymptotic theory for $ Z $-estimation in completely randomized experiments, and propose sandwich-type conservative covariance estimation. We then apply the developed theory to analyse both average and individual treatment effects in randomized experiments. For the average treatment effect, we consider model-based, model-imputed and model-assisted estimation strategies, where the first two strategies can be sensitive to model misspecification or require specific methods for parameter estimation. The model-assisted approach is robust to arbitrary model misspecification and always provides consistent average treatment effect estimation. We propose optimal ways to conduct model-assisted estimation using generally nonlinear least squares for parameter estimation. For the individual treatment effects, we propose directly modelling the relationship between individual effects and covariates, and discuss the model’s identifiability, inference and interpretation allowing for model misspecification. 
    more » « less
  3. Abstract Understanding network influence and its determinants are key challenges in political science and network analysis. Traditional latent variable models position actors within a social space based on network dependencies but often do not elucidate the underlying factors driving these interactions. To overcome this limitation, we propose the social influence regression (SIR) model, an extension of vector autoregression tailored for relational data that incorporates exogenous covariates into the estimation of influence patterns. The SIR model captures influence dynamics via a pair of$$n \times n$$matrices that quantify how the actions of one actor affect the future actions of another. This framework not only provides a statistical mechanism for explaining actor influence based on observable traits but also improves computational efficiency through an iterative block coordinate descent method. We showcase the SIR model’s capabilities by applying it to monthly conflict events between countries, using data from the Integrated Crisis Early Warning System (ICEWS). Our findings demonstrate the SIR model’s ability to elucidate complex influence patterns within networks by linking them to specific covariates. This paper’s main contributions are: (1) introducing a model that explains third-order dependencies through exogenous covariates and (2) offering an efficient estimation approach that scales effectively with large, complex networks. 
    more » « less
  4. Many data sources, including tracking social behav- ior to election polling to testing studies for understanding disease spread, are subject to sampling bias whose implications are not fully yet understood. In this paper we study estimation of a given feature (such as disease, or behavior at social media platforms) from biased samples, treating non-respondent individuals as missing data. Prevalence of the feature among sampled individuals has an upward bias under the assumption of individuals’ willingness to be sampled. This can be viewed as a regression model with symptoms as covariates and the feature as outcome. It is assumed that the outcome is unknown at the time of sampling, and therefore the missingness mechanism only depends on the covariates. We show that data, in spite of this, is missing at random only when the sizes of symptom classes in the population are known; otherwise data is missing not at random. With an information theoretic viewpoint, we show that sampling bias corresponds to external information due to individuals in the population knowing their covariates, and we quantify this external information by active information. The reduction in prevalence, when sampling bias is adjusted for, similarly translates into active information due to bias correction, with opposite sign to active information due to testing bias. We develop unified results that show that prevalence and active information estimates are asymptotically normal under all missing data mechanisms, when testing errors are absent and present respectively. The asymptotic behavior of the estimators is illustrated through simulations. 
    more » « less
  5. A valuable metric in understanding local infectious disease dynamics is the local time-varying reproduction number, i.e. the expected number of secondary local cases caused by each infected individual. Accurate estimation of this quantity requires distinguishing cases arising from local transmission from those imported from elsewhere. Realistically, we can expect identification of cases as local or imported to be imperfect. We study the propagation of such errors in estimation of the local time-varying reproduction number. In addition, we propose a Bayesian framework for estimation of the true local time-varying reproduction number when identification errors exist. And we illustrate the practical performance of our estimator through simulation studies and with outbreaks of COVID-19 in Hong Kong and Victoria, Australia. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. 
    more » « less