skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Bivariate small‐area estimation for binary and gaussian variables based on a conditionally specified model
Abstract

Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models.

 
more » « less
Award ID(s):
1733572
PAR ID:
10364339
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
78
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1555-1565
Size(s):
p. 1555-1565
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Many surveys collect information on discrete characteristics and continuous variables, that is, mixed-type variables. Small-area statistics of interest include means or proportions of the response variables as well as their domain means, which are the mean values at each level of a different categorical variable. However, item nonresponse in survey data increases the complexity of small-area estimation. To address this issue, we propose a multivariate mixed-effects model for mixed-type response variables subject to item nonresponse. We apply this method to two data structures where the data are missing completely at random by design. We use empirical data from two separate studies: a survey of pet owners and a dataset from the National Resources Inventory. In these applications, our proposed method leads to improvements relative to a direct estimator and a predictor based on a univariate model.

     
    more » « less
  2. Abstract

    Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero‐inflated lognormal mixed effects model for small area estimation. The model combines a unit‐level lognormal model for the positive RUSLE2 responses with a unit‐level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite‐derived land cover map with a geographic database of soil properties. We provide an R Shiny application calledviscover(available athttps://lyux.shinyapps.io/viscover/) to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators.

     
    more » « less
  3. Abstract

    Using the existing measures for training numerical (non-categorical) prediction models can cause misclassification of droughts. Thus, developing a drought category-based measure is critical. Moreover, the existing fixed drought category thresholds need to be improved. The objective of this research is to develop a category-based scoring support vector regression (CBS-SVR) model based on an improved drought categorization method to overcome misclassification in drought prediction. To derive variable threshold levels for drought categorization, K-means (KM) and Gaussian mixture (GM) clustering are compared with the traditional drought categorization. For drought prediction, CBS-SVR is performed by using the best categorization method. The new drought model was applied to the Red River of the North Basin (RRB) in the USA. In the model training and testing, precipitation, temperature, and actual evapotranspiration were selected as the predictors, and the target variables consisted of multivariate drought indices, as well as bivariate and univariate standardized drought indices. Results indicated that the drought categorization method, variable threshold levels, and the type of drought index were the major factors that influenced the accuracy of drought prediction. The CBS-SVR outperformed the support vector classification and traditional SVR by avoiding overfitting and miscategorization in drought prediction.

     
    more » « less
  4. Abstract

    Private lands provide key habitat for imperiled species and are core components of function protectected area networks; yet, their incorporation into national and regional conservation planning has been challenging. Identifying locations where private landowners are likely to participate in conservation initiatives can help avoid conflict and clarify trade‐offs between ecological benefits and sociopolitical costs. Empirical, spatially explicit assessment of the factors associated with conservation on private land is an emerging tool for identifying future conservation opportunities. However, most data on private land conservation are voluntarily reported and incomplete, which complicates these assessments. We used a novel application of occupancy models to analyze the occurrence of conservation easements on private land. We compared multiple formulations of occupancy models with a logistic regression model to predict the locations of conservation easements based on a spatially explicit social–ecological systems framework. We combined a simulation experiment with a case study of easement data in Idaho and Montana (United States) to illustrate the utility of the occupancy framework for modeling conservation on private land. Occupancy models that explicitly accounted for variation in reporting produced estimates of predictors that were substantially less biased than estimates produced by logistic regression under all simulated conditions. Occupancy models produced estimates for the 6 predictors we evaluated in our case study that were larger in magnitude, but less certain than those produced by logistic regression. These results suggest that occupancy models result in qualitatively different inferences regarding the effects of predictors on conservation easement occurrence than logistic regression and highlight the importance of integrating variable and incomplete reporting of participation in empirical analysis of conservation initiatives. Failure to do so can lead to emphasizing the wrong social, institutional, and environmental factors that enable conservation and underestimating conservation opportunities in landscapes where social norms or institutional constraints inhibit reporting.

     
    more » « less
  5. Summary

    The inferential model (IM) framework provides valid prior-free probabilistic inference by focusing on predicting unobserved auxiliary variables. But, efficient IM-based inference can be challenging when the auxiliary variable is of higher dimension than the parameter. Here we show that features of the auxiliary variable are often fully observed and, in such cases, a simultaneous dimension reduction and information aggregation can be achieved by conditioning. This proposed conditioning strategy leads to efficient IM inference and casts new light on Fisher's notions of sufficiency, conditioning and also Bayesian inference. A differential-equation-driven selection of a conditional association is developed, and validity of the conditional IM is proved under some conditions. For problems that do not admit a conditional IM of the standard form, we propose a more flexible class of conditional IMs based on localization. Examples of local conditional IMs in a bivariate normal model and a normal variance components model are also given.

     
    more » « less