Abstract Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models.
more »
« less
Empirical Bayes small area prediction under a zero‐inflated lognormal model with correlated random area effects
Abstract Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation‐2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero‐inflated lognormal mixed effects model for small area estimation. The model combines a unit‐level lognormal model for the positive RUSLE2 responses with a unit‐level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite‐derived land cover map with a geographic database of soil properties. We provide an R Shiny application calledviscover(available athttps://lyux.shinyapps.io/viscover/) to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators.
more »
« less
- Award ID(s):
- 1733572
- PAR ID:
- 10455695
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Biometrical Journal
- Volume:
- 62
- Issue:
- 8
- ISSN:
- 0323-3847
- Page Range / eLocation ID:
- p. 1859-1878
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Microbial networks offer critical insights into community structure, ecological interactions and host–microbe dynamics. However, constructing reliable microbiome networks remains challenging due to variability among existing inference methods, limited overlap between inferred networks and the absence of a gold standard (a universally accepted reference for benchmarking) for validation.We developedCMiNet, an R package and interactive Shiny App(https://cminet.wid.wisc.edu) that enables consensus microbiome network construction by integrating up to 10 widely used inference algorithms.CMiNetsupports both correlation‐based and conditional dependence‐based methods and provides users with flexible options to construct individual or consensus networks across different approaches.CMiNetintegrates results from multiple inference methods through a voting strategy that retains edges supported by a user‐defined number of methods. To assess robustness, we complement this with a bootstrap analysis that quantifies edge stability under resampling. By jointly reporting method support and bootstrap confidence,CMiNetprovides a reproducible framework that explicitly communicates both agreement across methods and stability under perturbation.We appliedCMiNetto gut and soil microbiome datasets, constructing consensus networks that retained edges supported by multiple methods and confirmed by bootstrap reproducibility values. To identify disease‐associated taxa, we developed an integrative strategy that compared results across machine learning, differential abundance and network‐based approaches, ensuring that selected taxa were consistently recovered across methods. In the soil dataset, this analysis highlighted key taxa such asKtedonobacteria, Acidobacteriae, Vicinamibacteria, MB‐A2‐108, IgnavibacteriaandAnaerolineae, all of which were confirmed by multiple independent strategies.more » « less
-
Abstract In small area estimation, different data sources are integrated in order to produce reliable estimates of target parameters (e.g., a mean or a proportion) for a collection of small subsets (areas) of a finite population. Regression models such as the linear mixed effects model or M-quantile regression are often used to improve the precision of survey sample estimates by leveraging auxiliary information for which means or totals are known at the area level. In many applications, the unit-level linkage of records from different sources is probabilistic and potentially error-prone. In this article, we present adjustments of the small area predictors that are based on either the linear mixed effects model or M-quantile regression to account for the presence of linkage error. These adjustments are developed from a two-component mixture model that hinges on the assumption of independence of the target and auxiliary variable given incorrect linkage. Estimation and inference is based on composite likelihoods and machinery revolving around the Expectation-Maximization Algorithm. For each of the two regression methods, we propose modified small area predictors and approximations for their mean squared errors. The empirical performance of the proposed approaches is studied in both design-based and model-based simulations that include comparisons to a variety of baselines.more » « less
-
Jianguo (Ed.)Yellow sweetclover (Melilotus officinalis; YSC) is an invasive biennial legume that bloomed across the Northern Great Plains in 2018–2019 in response to above-average precipitation. YSC can increase nitrogen (N) levels and potentially cause substantial changes in the composition of native plant species communities. There is little knowledge of the spatiotemporal variability and conditions causing substantial widespread blooms of YSC across western South Dakota (SD). We aimed to develop a generalized prediction model to predict the relative abundance of YSC in suitable habitats across rangelands of western South Dakota for 2019. Our research questions are: (1) What is the spatial extent of YSC across western South Dakota? (2) Which model can accurately predict the habitat and percent cover of YSC? and (3) What significant biophysical drivers affect its presence across western South Dakota? We trained machine learning models with in situ data (2016–2021), Sentinel 2A-derived surface reflectance and indices (10 m, 20 m) and site-specific variables of climate, topography, and edaphic factors to optimize model performance. We identified moisture proxies (Shortwave Infrared reflectance and variability in Tasseled Cap Wetness) as the important predictors to explain the YSC presence. Land Surface Water Index and variability in summer temperature were the top predictors in explaining the YSC abundance. We demonstrated how machine learning algorithms could help generate valuable information on the spatial distribution of this invasive plant. We delineated major YSC hotspots in Butte, Pennington, and Corson Counties of South Dakota. The floodplains of major rivers, including White and Bad Rivers, and areas around Badlands National Park also showed a higher occurrence probability and cover percentage. These prediction maps could aid land managers in devising management strategies for the regions that are prone to YSC outbreaks. The management workflow can also serve as a prototype for mapping other invasive plant species in similar regions.more » « less
-
Abstract Accurate specification of ionization production by energetic electron precipitation is critical for atmospheric chemistry models to assess the resultant atmospheric effects. Recent model‐observation comparison studies have increasingly highlighted the importance of considering precipitation fluxes in the full range of electron energy and pitch angle. However, previous parameterization methods were mostly proposed for isotropically precipitation electrons with energies up to 1 MeV, and the pitch angle dependence has not yet been parameterized. In this paper, we first characterize and tabulate the atmospheric ionization response to monoenergetic electrons with different pitch angles and energies between∼3 keV and∼33 MeV. A generalized method that fully accounts for the dependence of ionization production on background atmospheric conditions, electron energy, and pitch angle has been developed based on the parameterization method of Fang et al. (2010,https://doi.org/10.1029/2010GL045406). Moreover, we validate this method using 100 random atmospheric profiles and precipitation fluxes with monoenergetic and exponential energy distributions, and isotropic and sine pitch angle distributions. In a suite of 6,100 validation tests, the error in peak ionization altitude is found to be within 1 km in 91% of all the tests with a mean error of 2.7% in peak ionization rate and 1.9% in total ionization. This method therefore provides a reliable means to convert space‐measured precipitation energy and pitch angle distributions into ionization inputs for atmospheric chemistry models.more » « less
An official website of the United States government
