skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: Empirical Bayes small area prediction under a zero‐inflated lognormal model with correlated random area effects

Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero‐inflated lognormal mixed effects model for small area estimation. The model combines a unit‐level lognormal model for the positive RUSLE2 responses with a unit‐level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite‐derived land cover map with a geographic database of soil properties. We provide an R Shiny application calledviscover(available at to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators.

more » « less
Award ID(s):
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Biometrical Journal
Page Range / eLocation ID:
p. 1859-1878
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Many large‐scale surveys collect both discrete and continuous variables. Small‐area estimates may be desired for means of continuous variables, proportions in each level of a categorical variable, or for domain means defined as the mean of the continuous variable for each level of the categorical variable. In this paper, we introduce a conditionally specified bivariate mixed‐effects model for small‐area estimation, and provide a necessary and sufficient condition under which the conditional distributions render a valid joint distribution. The conditional specification allows better model interpretation. We use the valid joint distribution to calculate empirical Bayes predictors and use the parametric bootstrap to estimate the mean squared error. Simulation studies demonstrate the superior performance of the bivariate mixed‐effects model relative to univariate model estimators. We apply the bivariate mixed‐effects model to construct estimates for small watersheds using data from the Conservation Effects Assessment Project, a survey developed to quantify the environmental impacts of conservation efforts. We construct predictors of mean sediment loss, the proportion of land where the soil loss tolerance is exceeded, and the average sediment loss on land where the soil loss tolerance is exceeded. In the data analysis, the bivariate mixed‐effects model leads to more scientifically interpretable estimates of domain means than those based on two independent univariate models.

    more » « less
  2. Abstract

    In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV‐infected patients and reveal two interesting generaPrevotellaandVeillonellaassociated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open‐source, high‐performance computing languageJuliaand is freely available at

    more » « less
  3. Abstract

    Accurate specification of ionization production by energetic electron precipitation is critical for atmospheric chemistry models to assess the resultant atmospheric effects. Recent model‐observation comparison studies have increasingly highlighted the importance of considering precipitation fluxes in the full range of electron energy and pitch angle. However, previous parameterization methods were mostly proposed for isotropically precipitation electrons with energies up to 1 MeV, and the pitch angle dependence has not yet been parameterized. In this paper, we first characterize and tabulate the atmospheric ionization response to monoenergetic electrons with different pitch angles and energies between3 keV and33 MeV. A generalized method that fully accounts for the dependence of ionization production on background atmospheric conditions, electron energy, and pitch angle has been developed based on the parameterization method of Fang et al. (2010, Moreover, we validate this method using 100 random atmospheric profiles and precipitation fluxes with monoenergetic and exponential energy distributions, and isotropic and sine pitch angle distributions. In a suite of 6,100 validation tests, the error in peak ionization altitude is found to be within 1 km in 91% of all the tests with a mean error of 2.7% in peak ionization rate and 1.9% in total ionization. This method therefore provides a reliable means to convert space‐measured precipitation energy and pitch angle distributions into ionization inputs for atmospheric chemistry models.

    more » « less
  4. Abstract

    To investigate whether the source of the globally occurring garnet signature in mid‐ocean ridge basalt (MORB) is “true” (imparted by melting a garnet‐bearing source) or “apparent” (produced without the involvement of garnet), we present the basalt petrogenesis model Petrogen. Petrogen is built on Kinzler and Grove (1992a, and Behn and Grove (2015, and calculates the major element, trace element, and isotopic compositions of primary lherzolite‐saturated mantle melts and their subsequent evolution to eruption. This model is experimentally calibrated for melting in the plagioclase, spinel, and garnet fields with and without small amounts of water. Petrogen predicts that garnet‐lherzolite melting requires small mantle porosity (ϕ = 0.1%) and enriched (NaK# > 0.1), hot (>1400–1450°C), and damp mantle (>350–700 ppm water) sources. When garnet‐lherzolite melting does occur, the fraction of melting that takes place in the garnet field is not sufficient to impart a recognizable true garnet signature in the major and trace element composition of pooled melts except at ultraslow‐spreading rates (half rate  0.5 cm/yr). Therefore, for ambiguous garnet trace element signatures in MORB to be “true” would require that they are near‐fractional garnet‐lherzolite melts or garnet‐pyroxenite melts diluted by mixing in random proportions. While we do not test the process of random mixing, we find that near‐fractional and pooled melts of variable major and trace element mantle peridotite compositions at different mantle potential temperatures can explain the range of garnet signatures in MORB. For spinel‐peridotite melting to support garnet signatures requires mantle porositiesϕ = 0.1%; however, plagioclase‐peridotite melting, under specific circumstances, can explain MORB garnet signatures withϕ = 1%.

    more » « less
  5. We propose an empirical Bayes approach using a three‐component mixture model, theL2Nmodel, that may be applied to detect both differential (mean) expression and variation. It consists of two log‐normal components (L2) for differentially expressed (dispersed) features: one component for underexpressed (dispersed) features and the other for overexpressed (dispersed) features, and a single normal component (N) for nondifferentially expressed (dispersed) features. Simulation results show thatL2Ncan capture asymmetries in the numbers of overexpressed and underexpressed (dispersed) features (e.g., genes) when they exist and can provide a better fit to data in which the mixture component distributions are not well separated while also performing well under symmetry and separation. TheL2Nmodel is implemented in an R‐driven, user‐friendly, graphical interface called DVX, for differential variation and expression analysis, which does not require the user to have R programming knowledge. The interface also includes an implementation of differential expression analysis via thelimmapackage, and a differential variation and expression analysis using a three‐way normal mixture model. It offers a set of diagnostics plots, data transformation tools, and report generation in Microsoft Excel‐ and Word‐compatible formats. The interface is available on the web at

    more » « less