skip to main content


Title: Describing posterior distributions of variance components: Problems and the use of null distributions to aid interpretation
Abstract

Assessing the biological relevance of variance components estimated using Markov chain Monte Carlo (MCMC)‐based mixed‐effects models is not straightforward. Variance estimates are constrained to be greater than zero and their posterior distributions are often asymmetric. Different measures of central tendency for these distributions can therefore vary widely, and credible intervals cannot overlap zero, making it difficult to assess the size and statistical support for among‐group variance. Statistical support is often assessed through visual inspection of the whole posterior distribution and so relies on subjective decisions for interpretation.

We use simulations to demonstrate the difficulties of summarizing the posterior distributions of variance estimates from MCMC‐based models. We then describe different methods for generating the expected null distribution (i.e. a distribution of effect sizes that would be obtained if there was no among‐group variance) that can be used to aid in the interpretation of variance estimates.

Through comparing commonly used summary statistics of posterior distributions of variance components, we show that the posterior median is predominantly the least biased. We further show how null distributions can be used to derive ap‐value that provides complementary information to the commonly presented measures of central tendency and uncertainty. Finally, we show how thesep‐values facilitate the implementation of power analyses within an MCMC framework.

The use of null distributions for variance components can aid study design and the interpretation of results from MCMC‐based models. We hope that this manuscript will make empiricists using mixed models think more carefully about their results, what descriptive statistics they present and what inference they can make.

 
more » « less
NSF-PAR ID:
10467623
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
14
Issue:
10
ISSN:
2041-210X
Format(s):
Medium: X Size: p. 2557-2574
Size(s):
["p. 2557-2574"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core‐collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational‐wave background are in the sensitivity band of the ground‐based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high‐dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer‐intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state‐of‐the‐art Bayesian statistical parameter estimation methods will be given for researchers in this cross‐disciplinary area of gravitational wave data analysis.

    This article is categorized under:

    Applications of Computational Statistics > Signal and Image Processing and Coding

    Statistical and Graphical Methods of Data Analysis > Markov Chain Monte Carlo (MCMC)

    Statistical Models > Time Series Models

     
    more » « less
  2. Abstract

    Linear mixed‐effects models are powerful tools for analysing complex datasets with repeated or clustered observations, a common data structure in ecology and evolution. Mixed‐effects models involve complex fitting procedures and make several assumptions, in particular about the distribution of residual and random effects. Violations of these assumptions are common in real datasets, yet it is not always clear how much these violations matter to accurate and unbiased estimation.

    Here we address the consequences of violations in distributional assumptions and the impact of missing random effect components on model estimates. In particular, we evaluate the effects of skewed, bimodal and heteroscedastic random effect and residual variances, of missing random effect terms and of correlated fixed effect predictors. We focus on bias and prediction error on estimates of fixed and random effects.

    Model estimates were usually robust to violations of assumptions, with the exception of slight upward biases in estimates of random effect variance if the generating distribution was bimodal but was modelled by Gaussian error distributions. Further, estimates for (random effect) components that violated distributional assumptions became less precise but remained unbiased. However, this particular problem did not affect other parameters of the model. The same pattern was found for strongly correlated fixed effects, which led to imprecise, but unbiased estimates, with uncertainty estimates reflecting imprecision.

    Unmodelled sources of random effect variance had predictable effects on variance component estimates. The pattern is best viewed as a cascade of hierarchical grouping factors. Variances trickle down the hierarchy such that missing higher‐level random effect variances pool at lower levels and missing lower‐level and crossed random effect variances manifest as residual variance.

    Overall, our results show remarkable robustness of mixed‐effects models that should allow researchers to use mixed‐effects models even if the distributional assumptions are objectively violated. However, this does not free researchers from careful evaluation of the model. Estimates that are based on data that show clear violations of key assumptions should be treated with caution because individual datasets might give highly imprecise estimates, even if they will be unbiased on average across datasets.

     
    more » « less
  3. Abstract

    Estimating phenotypic distributions of populations and communities is central to many questions in ecology and evolution. These distributions can be characterized by their moments (mean, variance, skewness and kurtosis) or diversity metrics (e.g. functional richness). Typically, such moments and metrics are calculated using community‐weighted approaches (e.g. abundance‐weighted mean). We propose an alternative bootstrapping approach that allows flexibility in trait sampling and explicit incorporation of intraspecific variation, and show that this approach significantly improves estimation while allowing us to quantify uncertainty.

    We assess the performance of different approaches for estimating the moments of trait distributions across various sampling scenarios, taxa and datasets by comparing estimates derived from simulated samples with the true values calculated from full datasets. Simulations differ in sampling intensity (individuals per species), sampling biases (abundance, size), trait data source (local vs. global) and estimation method (two types of community‐weighting, two types of bootstrapping).

    We introduce thetraitstrapR package, which contains a modular and extensible set of bootstrapping and weighted‐averaging functions that use community composition and trait data to estimate the moments of community trait distributions with their uncertainty. Importantly, the first function in the workflow,trait_fill, allows the user to specify hierarchical structures (e.g. plot within site, experiment vs. control, species within genus) to assign trait values to each taxon in each community sample.

    Across all taxa, simulations and metrics, bootstrapping approaches were more accurate and less biased than community‐weighted approaches. With bootstrapping, a sample size of 9 or more measurements per species per trait generally included the true mean within the 95% CI. It reduced average percent errors by 26%–74% relative to community‐weighting. Random sampling across all species outperformed both size‐ and abundance‐biased sampling.

    Our results suggest randomly sampling ~9 individuals per sampling unit and species, covering all species in the community and analysing the data using nonparametric bootstrapping generally enable reliable inference on trait distributions, including the central moments, of communities. By providing better estimates of community trait distributions, bootstrapping approaches can improve our ability to link traits to both the processes that generate them and their effects on ecosystems.

     
    more » « less
  4. Abstract

    Temporal fluctuations in growth rates can arise from both variation in age‐specific vital rates and temporal fluctuations in age structure (i.e. the relative abundance of individuals in each age‐class). However, empirical assessments of temporal fluctuations in age structure and their effects on population growth rate are relatively rare. Most research has focused on understanding the contribution of changing vital rates to population growth rates and these analyses routinely assume that: (a) populations have stable age distributions, (b) environmental influences on vital rates and age structure are stationary (i.e. the mean and/or variance of these processes does not change over time), and (c) dynamics are independent of density.

    Here we quantified fluctuations in age structure and assessed whether they were stationary for four populations of free‐ranging vertebrates: moose (observed for 48 years), elk (15 years), tawny owls (15 years) and grey wolves (17 years). We also assessed the extent that fluctuations in age structure were useful for predicting annual population growth rates using models which account for density dependence.

    Fluctuations in age structure were of a similar magnitude to fluctuations in abundance. For three populations (moose, elk, owls), the mean and the skew of the age distribution fluctuated without stabilizing over the observed time periods. More precisely, the sample variance (interannual variance) of age structure indices increased with the length of the study period, which suggests that fluctuations in age structure were non‐stationary for these populations – at least over the 15‐ to 48‐year periods analysed.

    Fluctuations in age structure were associated with population growth rate for two populations. In particular, population growth varied from positive to negative for moose and from near zero to negative for elk as the average age of adults increased over its observed range.

    Non‐stationarity in age structure may represent an important mechanism by which abundance becomes non‐stationary – and therefore difficult to forecast – over time‐scales of concern to wildlife managers. Overall, our results emphasize the need for vertebrate populations to be modelled using approaches that consider transient dynamics and density dependence and that do not rely on the assumption that environmental processes are stationary.

    A freePlain Language Summarycan be found within the Supporting Information of this article.

     
    more » « less
  5. Abstract

    Biodiversity studies rely heavily on estimates of species' distributions often obtained through ecological niche modelling. Numerous software packages exist that allow users to model ecological niches using machine learning and statistical methods. However, no existing package with a graphical user interface allows users to perform model calibration and selection based on convex forms such as ellipsoids, which may match fundamental ecological niche shapes better, incorporating tools for exploring, modelling, and evaluating niches and distributions that are intuitive for both novice and proficient users.

    Here we describe anrpackage, NicheToolBox(ntbox), that allows users to conduct all processing steps involved in ecological niche modelling: downloading and curating occurrence data, obtaining and transforming environmental data layers, selecting environmental variables, exploring relationships between geographic and environmental spaces, calibrating and selecting ellipsoid models, evaluating models using binomial and partial ROC tests, assessing extrapolation risk, and performing geographic information system operations via a graphical user interface. A summary of the entire workflow is produced for use as a stand‐alone algorithm or as part of research reports.

    The method is explained in detail and tested via modelling the threatened feline speciesLeopardus wiedii. Georeferenced occurrence data for this species are queried to display both point occurrences and the IUCN extent of occurrence polygon (IUCN, 2007). This information is used to illustrate tools available for accessing, processing and exploring biodiversity data (e.g. number of occurrences and chronology of collecting) and transforming environmental data (e.g. a summary PCA for 19 bioclimatic layers). Visualizations of three‐dimensional ecological niches modelled as minimum volume ellipsoids are developed with ancillary statistics. This niche model is then projected to geographic space, to represent a corresponding potential suitability map.

    Usingntboxallows a fast and straightforward means by which to retrieve and manipulate occurrence and environmental data, which can then be implemented in model calibration, projection and evaluation for assessing distributions of species in geographic space and their corresponding environmental combinations.

     
    more » « less