skip to main content


Title: Noniterative adjustment to regression estimators with population‐based auxiliary information for semiparametric models
Abstract

Disease registries, surveillance data, and other datasets with extremely large sample sizes become increasingly available in providing population‐based information on disease incidence, survival probability, or other important public health characteristics. Such information can be leveraged in studies that collect detailed measurements but with smaller sample sizes. In contrast to recent proposals that formulate additional information as constraints in optimization problems, we develop a general framework to construct simple estimators that update the usual regression estimators with some functionals of data that incorporate the additional information. We consider general settings that incorporate nuisance parameters in the auxiliary information, non‐i.i.d. data such as those from case‐control studies, and semiparametric models with infinite‐dimensional parameters common in survival analysis. Details of several important data and sampling settings are provided with numerical examples.

 
more » « less
Award ID(s):
1711952
NSF-PAR ID:
10364218
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
79
Issue:
1
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 140-150
Size(s):
["p. 140-150"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Structured population models are among the most widely used tools in ecology and evolution. Integral projection models (IPMs) use continuous representations of how survival, reproduction and growth change as functions of state variables such as size, requiring fewer parameters to be estimated than projection matrix models (PPMs). Yet, almost all published IPMs make an important assumption that size‐dependent growth transitions are or can be transformed to be normally distributed. In fact, many organisms exhibit highly skewed size transitions. Small individuals can grow more than they can shrink, and large individuals may often shrink more dramatically than they can grow. Yet, the implications of such skew for inference from IPMs has not been explored, nor have general methods been developed to incorporate skewed size transitions into IPMs, or deal with other aspects of real growth rates, including bounds on possible growth or shrinkage.

    Here, we develop a flexible approach to modelling skewed growth data using a modified beta regression model. We propose that sizes first be converted to a (0,1) interval by estimating size‐dependent minimum and maximum sizes through quantile regression. Transformed data can then be modelled using beta regression with widely available statistical tools. We demonstrate the utility of this approach using demographic data for a long‐lived plant, gorgonians and an epiphytic lichen. Specifically, we compare inferences of population parameters from discrete PPMs to those from IPMs that either assume normality or incorporate skew using beta regression or, alternatively, a skewed normal model.

    The beta and skewed normal distributions accurately capture the mean, variance and skew of real growth distributions. Incorporating skewed growth into IPMs decreases population growth and estimated life span relative to IPMs that assume normally distributed growth, and more closely approximate the parameters of PPMs that do not assume a particular growth distribution. A bounded distribution, such as the beta, also avoids the eviction problem caused by predicting some growth outside the modelled size range.

    Incorporating biologically relevant skew in growth data has important consequences for inference from IPMs. The approaches we outline here are flexible and easy to implement with existing statistical tools.

     
    more » « less
  2. Abstract

    Small area estimation (SAE) has become an important tool in official statistics, used to construct estimates of population quantities for domains with small sample sizes. Typical area-level models function as a type of heteroscedastic regression, where the variance for each domain is assumed to be known and plugged in following a design-based estimate. Recent work has considered hierarchical models for the variance, where the design-based estimates are used as an additional data point to model the latent true variance in each domain. These hierarchical models may incorporate covariate information but can be difficult to sample from in high-dimensional settings. Utilizing recent distribution theory, we explore a class of Bayesian hierarchical models for SAE that smooth both the design-based estimate of the mean and the variance. In addition, we develop a class of unit-level models for heteroscedastic Gaussian response data. Importantly, we incorporate both covariate information as well as spatial dependence, while retaining a conjugate model structure that allows for efficient sampling. We illustrate our methodology through an empirical simulation study as well as an application using data from the American Community Survey.

     
    more » « less
  3. Summary

    Spatial data have become increasingly common in epidemiology and public health research thanks to advances in GIS (Geographic Information Systems) technology. In health research, for example, it is common for epidemiologists to incorporate geographically indexed data into their studies. In practice, however, the spatially defined covariates are often measured with error. Naive estimators of regression coefficients are attenuated if measurement error is ignored. Moreover, the classical measurement error theory is inapplicable in the context of spatial modeling because of the presence of spatial correlation among the observations. We propose a semiparametric regression approach to obtain bias-corrected estimates of regression parameters and derive their large sample properties. We evaluate the performance of the proposed method through simulation studies and illustrate using data on Ischemic Heart Disease (IHD). Both simulation and practical application demonstrate that the proposed method can be effective in practice.

     
    more » « less
  4. Kittayapong, Pattamaporn (Ed.)
    Background

    Sugar feeding is an important behavior which may determine vector potential of female mosquitoes. Sugar meals can reduce blood feeding frequency, enhance survival, and decrease fecundity, as well as provide energetic reserves to fuel energy intensive behaviors such as mating and host seeking. Sugar feeding behavior can be harnessed for vector control (e.g. attractive toxic sugar baits). Few studies have addressed sugar feeding ofAedes albopictus, a vector of arboviruses of public health importance, including dengue and Zika viruses. To address this knowledge gap, we assessed sugar feeding patterns ofAe.albopictusfor the first time in its invasive northeastern USA range.

    Methodology/Principal findings

    Using the cold anthrone fructose assay with robust sample sizes, we demonstrated that a large percentage of both male (49.6%) and female (41.8%)Ae.albopictusfed on plant or homopteran derived sugar sources within 24 hrs prior to capture. Our results suggest that sugar feeding behavior increases when environmental conditions are dry (high saturation deficit) and may vary by behavioral status (host seeking vs. resting). Furthermore, mosquitoes collected on properties with flowers (>3 blooms) had higher fructose concentrations compared to those collected from properties with few to no flowers (0–3).

    Conclusions/Significance

    Our results provide the first evidence ofAe.albopictussugar feeding behavior in the Northeastern US and reveal relatively high rates of sugar feeding. These results suggest the potential success for regional deployment of toxic sugar baits. In addition, we demonstrate the impact of several environmental and mosquito parameters (saturation deficit, presence of flowers, host seeking status, and sex) on sugar feeding. Placing sugar feeding behavior in the context of these environmental and mosquito parameters provides further insight into spatiotemporal dynamics of feeding behavior forAe.albopictus, and in turn, provides information for evidence-based control decisions.

     
    more » « less
  5. Abstract

    Modeling and drawing inference on the joint associations between single‐nucleotide polymorphisms and a disease has sparked interest in genome‐wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “largen, divergingp” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposedrefineddebiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large‐scale hospital‐based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.

     
    more » « less