skip to main content


Title: Multi‐species occupancy models as robust estimators of community richness
Abstract

Understanding patterns of diversity is central to ecology and conservation, yet estimates of diversity are often biased by imperfect detection. In recent years, multi‐species occupancy models (MSOM) have been developed as a statistical tool to account for species‐specific heterogeneity in detection while estimating true measures of diversity. Although the power of these models has been tested in various ways, their ability to estimate gamma diversity—or true community size,Nis a largely unrecognized feature that needs rigorous evaluation.

We use both simulations and an empirical dataset to evaluate the bias, precision, accuracy and coverage of estimates ofNfrom MSOM compared to the widely applied iChao2 non‐parametric estimator. We simulated 5,600 datasets across seven scenarios of varying average occupancy and detectability covariates, as well as varying numbers of sites, replicates and true community size. Additionally, we use a real dataset of surveys over 9 years (where species accumulation reached an asymptote, indicating trueN), to estimateNfrom each annual survey.

Simulations showed that both MSOM and iChao2 estimators are generally accurate (i.e. unbiased and precise) except under unideal scenarios where mean species occupancy is low. In such scenarios, MSOM frequently overestimatedN. Across all scenarios, MSOM estimates were less certain than iChao2, but this led to over‐confident iChao2 estimates that showed poor coverage. Results from the real dataset largely confirmed the simulation findings, with MSOM estimates showing greater accuracy and coverage than iChao2.

Community ecologists have a wide choice of analytical methods, and both iChao2 and MSOM estimates ofNare substantially preferable to raw species counts. The simplicity of non‐parametric estimators has obvious advantages, but our results show that in many cases, MSOM may provide superior estimates that also account more accurately for uncertainty. Both methods can show strong bias when average occupancy is very low, and practitioners should show caution when using estimates derived from either method under such conditions.

 
more » « less
Award ID(s):
2033263 1703048
NSF-PAR ID:
10456619
Author(s) / Creator(s):
 ;  ;  ;
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
11
Issue:
5
ISSN:
2041-210X
Page Range / eLocation ID:
p. 633-642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Estimating phenotypic distributions of populations and communities is central to many questions in ecology and evolution. These distributions can be characterized by their moments (mean, variance, skewness and kurtosis) or diversity metrics (e.g. functional richness). Typically, such moments and metrics are calculated using community‐weighted approaches (e.g. abundance‐weighted mean). We propose an alternative bootstrapping approach that allows flexibility in trait sampling and explicit incorporation of intraspecific variation, and show that this approach significantly improves estimation while allowing us to quantify uncertainty.

    We assess the performance of different approaches for estimating the moments of trait distributions across various sampling scenarios, taxa and datasets by comparing estimates derived from simulated samples with the true values calculated from full datasets. Simulations differ in sampling intensity (individuals per species), sampling biases (abundance, size), trait data source (local vs. global) and estimation method (two types of community‐weighting, two types of bootstrapping).

    We introduce thetraitstrapR package, which contains a modular and extensible set of bootstrapping and weighted‐averaging functions that use community composition and trait data to estimate the moments of community trait distributions with their uncertainty. Importantly, the first function in the workflow,trait_fill, allows the user to specify hierarchical structures (e.g. plot within site, experiment vs. control, species within genus) to assign trait values to each taxon in each community sample.

    Across all taxa, simulations and metrics, bootstrapping approaches were more accurate and less biased than community‐weighted approaches. With bootstrapping, a sample size of 9 or more measurements per species per trait generally included the true mean within the 95% CI. It reduced average percent errors by 26%–74% relative to community‐weighting. Random sampling across all species outperformed both size‐ and abundance‐biased sampling.

    Our results suggest randomly sampling ~9 individuals per sampling unit and species, covering all species in the community and analysing the data using nonparametric bootstrapping generally enable reliable inference on trait distributions, including the central moments, of communities. By providing better estimates of community trait distributions, bootstrapping approaches can improve our ability to link traits to both the processes that generate them and their effects on ecosystems.

     
    more » « less
  2. Abstract

    Clark et al. (2019) sought to extend the Loreau–Hector partitioning scheme by showing how to estimate selection and complementarity effects from an incomplete sample of species. We demonstrate that their approach suffers from serious conceptual and mathematical errors. Instead of finding unbiased estimators for a finite population, they inserted ad hoc correction factors into unbiased parameter estimators for an infinite population without any mathematical justification in order to force the sample estimators of an infinite population to converge to the true finite population parameter values as sample sizenapproached population sizeN. In doing so, they confused the unbiasedness of a sample estimator with its equivalence to the true population parameter value when.

    Additionally, we show that their estimators of complementarity, selection and the net biodiversity effect are incorrect. We then derive the correct unbiased estimators but caution that, contrary to what Clark et al. claim, these quantities will not approximate the corresponding population parameters without significant repeated random sampling, something that would likely be unfeasible in most if not all biodiversity experiments.

    Clark et al. also state that their method can be used to compare distinct experiments characterized by different species and diversity levels, or extrapolate from biodiversity experiments to natural systems. This is incorrect because relative yields are not a property of individual species like monoculture yields but an emergent and specific feature of an experimental community. As such, two experimental communities, even when overlapping significantly in species, are incommensurable for the purpose of predicting relative yields. In other words, different experimental communities are not equivalent to different samples taken from the same statistical population.

    Finally, Clark et al. incorrectly claim that both the original Loreau–Hector partitioning scheme and their extension work for any baseline despite the fact that recent research has shown that a nonlinear relationship between monoculture density and ecosystem functioning will likely inflate the net biodiversity effect in plant systems, and will always lead to spurious measurements of complementarity and selection.

     
    more » « less
  3. Abstract

    Camera traps deployed in grids or stratified random designs are a well‐established survey tool for wildlife but there has been little evaluation of study design parameters.

    We used an empirical subsampling approach involving 2,225 camera deployments run at 41 study areas around the world to evaluate three aspects of camera trap study design (number of sites, duration and season of sampling) and their influence on the estimation of three ecological metrics (species richness, occupancy and detection rate) for mammals.

    We found that 25–35 camera sites were needed for precise estimates of species richness, depending on scale of the study. The precision of species‐level estimates of occupancy (ψ) was highly sensitive to occupancy level, with <20 camera sites needed for precise estimates of common (ψ > 0.75) species, but more than 150 camera sites likely needed for rare (ψ < 0.25) species. Species detection rates were more difficult to estimate precisely at the grid level due to spatial heterogeneity, presumably driven by unaccounted habitat variability factors within the study area. Running a camera at a site for 2 weeks was most efficient for detecting new species, but 3–4 weeks were needed for precise estimates of local detection rate, with no gains in precision observed after 1 month. Metrics for all mammal communities were sensitive to seasonality, with 37%–50% of the species at the sites we examined fluctuating significantly in their occupancy or detection rates over the year. This effect was more pronounced in temperate sites, where seasonally sensitive species varied in relative abundance by an average factor of 4–5, and some species were completely absent in one season due to hibernation or migration.

    We recommend the following guidelines to efficiently obtain precise estimates of species richness, occupancy and detection rates with camera trap arrays: run each camera for 3–5 weeks across 40–60 sites per array. We recommend comparisons of detection rates be model based and include local covariates to help account for small‐scale variation. Furthermore, comparisons across study areas or times must account for seasonality, which could have strong impacts on mammal communities in both tropical and temperate sites.

     
    more » « less
  4. Abstract

    Historical museum records provide potentially useful data for identifying drivers of change in species occupancy. However, because museum records are typically obtained via many collection methods, methodological developments are needed to enable robust inferences. Occupancy–detection models, a relatively new and powerful suite of statistical methods, are a potentially promising avenue because they can account for changes in collection effort through space and time.

    We use simulated datasets to identify how and when patterns in data and/or modelling decisions can bias inference. We focus primarily on the consequences of contrasting methodological approaches for dealing with species' ranges and inferring species' non‐detections in both space and time.

    We find that not all datasets are suitable for occupancy–detection analysis but, under the right conditions (namely, datasets that are broken into more time periods for occupancy inference and that contain a high fraction of community‐wide collections, or collection events that focus on communities of organisms), models can accurately estimate trends. Finally, we present a case study on eastern North American odonates where we calculate long‐term trends of occupancy using our most robust workflow.

    These results indicate that occupancy–detection models are a suitable framework for some research cases and expand the suite of available tools for macroecological analysis available to researchers, especially where structured datasets are unavailable.

     
    more » « less
  5. Abstract

    Technological advances have steadily increased the detail of animal tracking datasets, yet fundamental data limitations exist for many species that cause substantial biases in home‐range estimation. Specifically, the effective sample size of a range estimate is proportional to the number of observed range crossings, not the number of sampled locations. Currently, the most accurate home‐range estimators condition on an autocorrelation model, for which the standard estimation frame‐works are based on likelihood functions, even though these methods are known to underestimate variance—and therefore ranging area—when effective sample sizes are small.

    Residual maximum likelihood (REML) is a widely used method for reducing bias in maximum‐likelihood (ML) variance estimation at small sample sizes. Unfortunately, we find that REML is too unstable for practical application to continuous‐time movement models. When the effective sample sizeNis decreased toN ≤ (10), which is common in tracking applications, REML undergoes a sudden divergence in variance estimation. To avoid this issue, while retaining REML’s first‐order bias correction, we derive a family of estimators that leverage REML to make a perturbative correction to ML. We also derive AIC values for REML and our estimators, including cases where model structures differ, which is not generally understood to be possible.

    Using both simulated data and GPS data from lowland tapir (Tapirus terrestris), we show how our perturbative estimators are more accurate than traditional ML and REML methods. Specifically, when(5) home‐range crossings are observed, REML is unreliable by orders of magnitude, ML home ranges are ~30% underestimated, and our perturbative estimators yield home ranges that are only ~10% underestimated. A parametric bootstrap can then reduce the ML and perturbative home‐range underestimation to ~10% and ~3%, respectively.

    Home‐range estimation is one of the primary reasons for collecting animal tracking data, and small effective sample sizes are a more common problem than is currently realized. The methods introduced here allow for more accurate movement‐model and home‐range estimation at small effective sample sizes, and thus fill an important role for animal movement analysis. Given REML’s widespread use, our methods may also be useful in other contexts where effective sample sizes are small.

     
    more » « less