skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Bivariate small‐area estimation for binary and gaussian variables based on a conditionally specified model
Abstract Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models.  more » « less
Award ID(s):
1733572
PAR ID:
10364339
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1555-1565
Size(s):
p. 1555-1565
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation‐2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero‐inflated lognormal mixed effects model for small area estimation. The model combines a unit‐level lognormal model for the positive RUSLE2 responses with a unit‐level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite‐derived land cover map with a geographic database of soil properties. We provide an R Shiny application calledviscover(available athttps://lyux.shinyapps.io/viscover/) to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators. 
    more » « less
  2. Abstract In small area estimation, different data sources are integrated in order to produce reliable estimates of target parameters (e.g., a mean or a proportion) for a collection of small subsets (areas) of a finite population. Regression models such as the linear mixed effects model or M-quantile regression are often used to improve the precision of survey sample estimates by leveraging auxiliary information for which means or totals are known at the area level. In many applications, the unit-level linkage of records from different sources is probabilistic and potentially error-prone. In this article, we present adjustments of the small area predictors that are based on either the linear mixed effects model or M-quantile regression to account for the presence of linkage error. These adjustments are developed from a two-component mixture model that hinges on the assumption of independence of the target and auxiliary variable given incorrect linkage. Estimation and inference is based on composite likelihoods and machinery revolving around the Expectation-Maximization Algorithm. For each of the two regression methods, we propose modified small area predictors and approximations for their mean squared errors. The empirical performance of the proposed approaches is studied in both design-based and model-based simulations that include comparisons to a variety of baselines. 
    more » « less
  3. The problem of classifying multiple categorical responses is fundamental in modern machine learning and statistics, with diverse applications in fields such as bioinformatics and imaging. This manuscript investigates linear discriminant analysis (LDA) with high-dimensional predictors and multiple multi-class responses. Specifically, we first examine two different classification scenarios under the bivariate LDA model: joint classification of the two responses and conditional classification of one response while observing the other. To achieve optimal classification rules for both scenarios, we introduce two novel tensor formulations of the discriminant coefficients and corresponding regularization strategies. For joint classification, we propose an overlapping group lasso penalty and a blockwise coordinate descent algorithm to efficiently compute the joint discriminant coefficient tensors. For conditional classification, we utilize an alternating direction method of multipliers (ADMM) algorithm to compute the discriminant coefficient tensors under new constraints. We then extend our method and algorithms to general multivariate responses. Finally, we validate the effectiveness of our approach through simulation studies and applications to benchmark datasets. 
    more » « less
  4. Abstract Private lands provide key habitat for imperiled species and are core components of function protectected area networks; yet, their incorporation into national and regional conservation planning has been challenging. Identifying locations where private landowners are likely to participate in conservation initiatives can help avoid conflict and clarify trade‐offs between ecological benefits and sociopolitical costs. Empirical, spatially explicit assessment of the factors associated with conservation on private land is an emerging tool for identifying future conservation opportunities. However, most data on private land conservation are voluntarily reported and incomplete, which complicates these assessments. We used a novel application of occupancy models to analyze the occurrence of conservation easements on private land. We compared multiple formulations of occupancy models with a logistic regression model to predict the locations of conservation easements based on a spatially explicit social–ecological systems framework. We combined a simulation experiment with a case study of easement data in Idaho and Montana (United States) to illustrate the utility of the occupancy framework for modeling conservation on private land. Occupancy models that explicitly accounted for variation in reporting produced estimates of predictors that were substantially less biased than estimates produced by logistic regression under all simulated conditions. Occupancy models produced estimates for the 6 predictors we evaluated in our case study that were larger in magnitude, but less certain than those produced by logistic regression. These results suggest that occupancy models result in qualitatively different inferences regarding the effects of predictors on conservation easement occurrence than logistic regression and highlight the importance of integrating variable and incomplete reporting of participation in empirical analysis of conservation initiatives. Failure to do so can lead to emphasizing the wrong social, institutional, and environmental factors that enable conservation and underestimating conservation opportunities in landscapes where social norms or institutional constraints inhibit reporting. 
    more » « less
  5. null (Ed.)
    In a wireless network with dynamic spectrum sharing, tracking temporal spectrum holes across a wide spectrum band is a challenging task. We consider a scenario in which the spectrum is divided into a large number of bands or channels, each of which has the potential to provide dynamic spectrum access opportunities. The occupancy times of each band by primary users are generally non-exponentially distributed. We develop an approach to determine and parameterize a small selected subset of the bands with good spectrum access opportunities, using limited computational resources under noisy measurements. We model the noisy measurements of the received signal in each band as a bivariate Markov modulated Gaussian process, which can be viewed as a continuous-time bivariate Markov chain observed through Gaussian noise. The underlying bivariate Markov process allows for the characterization of non-exponentially distributed state sojourn times. The proposed scheme combines an online expectation-maximization algorithm for parameter estimation with a computing budget allocation algorithm. Observation time is allocated across the bands to determine the subset of G out of G frequency bands with the largest mean idle times for dynamic spectrum access and at the same time to obtain accurate parameter estimates for this subset of bands. Our simulation results show that when channel holding times are non-exponential, the proposed scheme achieves a substantial improvement in the probability of correct selection of the best subset of bands compared to an approach based on a (univariate) Markov modulated Gaussian process model. 
    more » « less