skip to main content

Title: Joint species distribution models with imperfect detection for high‐dimensional spatial data

Determining the spatial distributions of species and communities is a key task in ecology and conservation efforts. Joint species distribution models are a fundamental tool in community ecology that use multi‐species detection–nondetection data to estimate species distributions and biodiversity metrics. The analysis of such data is complicated by residual correlations between species, imperfect detection, and spatial autocorrelation. While many methods exist to accommodate each of these complexities, there are few examples in the literature that address and explore all three complexities simultaneously. Here we developed a spatial factor multi‐species occupancy model to explicitly account for species correlations, imperfect detection, and spatial autocorrelation. The proposed model uses a spatial factor dimension reduction approach and Nearest Neighbor Gaussian Processes to ensure computational efficiency for data sets with both a large number of species (e.g., >100) and spatial locations (e.g., 100,000). We compared the proposed model performance to five alternative models, each addressing a subset of the three complexities. We implemented the proposed and alternative models in thespOccupancysoftware, designed to facilitate application via an accessible, well documented, and open‐source R package. Using simulations, we found that ignoring the three complexities when present leads to inferior model predictive performance, and the impacts of failing to account for one or more complexities will depend on the objectives of a given study. Using a case study on 98 bird species across the continental US, the spatial factor multi‐species occupancy model had the highest predictive performance among the alternative models. Our proposed framework, together with its implementation inspOccupancy, serves as a user‐friendly tool to understand spatial variation in species distributions and biodiversity while addressing common complexities in multi‐species detection–nondetection data.

more » « less
Award ID(s):
2213566 1916395
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Occupancy modelling is a common approach to assess species distribution patterns, while explicitly accounting for false absences in detection–nondetection data. Numerous extensions of the basic single‐species occupancy model exist to model multiple species, spatial autocorrelation and to integrate multiple data types. However, development of specialized and computationally efficient software to incorporate such extensions, especially for large datasets, is scarce or absent.

    We introduce thespOccupancy Rpackage designed to fit single‐species and multi‐species spatially explicit occupancy models. We fit all models within a Bayesian framework using Pólya‐Gamma data augmentation, which results in fast and efficient inference.spOccupancyprovides functionality for data integration of multiple single‐species detection–nondetection datasets via a joint likelihood framework. The package leverages Nearest Neighbour Gaussian Processes to account for spatial autocorrelation, which enables spatially explicit occupancy modelling for potentially massive datasets (e.g. 1,000s–100,000s of sites).

    spOccupancyprovides user‐friendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and k‐fold cross‐validation) and out‐of‐sample prediction. We illustrate the package's functionality via a vignette, simulated data analysis and two bird case studies.

    ThespOccupancypackage provides a user‐friendly platform to fit a variety of single and multi‐species occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large datasets.

    more » « less
  2. Abstract

    1. The occurrence and distributions of wildlife populations and communities are shifting as a result of global changes. To evaluate whether these shifts are negatively impacting biodiversity processes, it is critical to monitor the status, trends and effects of environmental variables on entire communities. However, modelling the dynamics of multiple species simultaneously can require large amounts of diverse data, and few modelling approaches exist to simultaneously provide species and community‐level inferences.

    2. We present an ‘integrated community occupancy model’ (ICOM) that unites principles of data integration and hierarchical community modelling in a single framework to provide inferences on species‐specific and community occurrence dynamics using multiple data sources. The ICOM combines replicated and nonreplicated detection–nondetection data sources using a hierarchical framework that explicitly accounts for different detection and sampling processes across data sources. We use simulations to compare the ICOM to previously developed hierarchical community occupancy models and single species integrated distribution models. We then apply our model to assess the occurrence and biodiversity dynamics of foliage‐gleaning birds in the White Mountain National Forest in the northeastern USA from 2010 to 2018 using three independent data sources.

    3. Simulations reveal that integrating multiple data sources in the ICOM increased precision and accuracy of species and community‐level inferences compared to single data source models, although benefits of integration were dependent on the information content of individual data sources (e.g. amount of replication). Compared to single species models, the ICOM yielded more precise species‐level estimates. Within our case study, the ICOM had the highest out‐of‐sample predictive performance compared to single species models and models that used only a subset of the three data sources.

    4. The ICOM provides more precise estimates of occurrence dynamics compared to multi‐species models using single data sources or integrated single‐species models. We further found that the ICOM had improved predictive performance across a broad region of interest with an empirical case study of forest birds. The ICOM offers an attractive approach to estimate species and biodiversity dynamics, which is additionally valuable to inform management objectives of both individual species and their broader communities.

    more » « less
  3. Abstract

    Understanding patterns of diversity is central to ecology and conservation, yet estimates of diversity are often biased by imperfect detection. In recent years, multi‐species occupancy models (MSOM) have been developed as a statistical tool to account for species‐specific heterogeneity in detection while estimating true measures of diversity. Although the power of these models has been tested in various ways, their ability to estimate gamma diversity—or true community size,Nis a largely unrecognized feature that needs rigorous evaluation.

    We use both simulations and an empirical dataset to evaluate the bias, precision, accuracy and coverage of estimates ofNfrom MSOM compared to the widely applied iChao2 non‐parametric estimator. We simulated 5,600 datasets across seven scenarios of varying average occupancy and detectability covariates, as well as varying numbers of sites, replicates and true community size. Additionally, we use a real dataset of surveys over 9 years (where species accumulation reached an asymptote, indicating trueN), to estimateNfrom each annual survey.

    Simulations showed that both MSOM and iChao2 estimators are generally accurate (i.e. unbiased and precise) except under unideal scenarios where mean species occupancy is low. In such scenarios, MSOM frequently overestimatedN. Across all scenarios, MSOM estimates were less certain than iChao2, but this led to over‐confident iChao2 estimates that showed poor coverage. Results from the real dataset largely confirmed the simulation findings, with MSOM estimates showing greater accuracy and coverage than iChao2.

    Community ecologists have a wide choice of analytical methods, and both iChao2 and MSOM estimates ofNare substantially preferable to raw species counts. The simplicity of non‐parametric estimators has obvious advantages, but our results show that in many cases, MSOM may provide superior estimates that also account more accurately for uncertainty. Both methods can show strong bias when average occupancy is very low, and practitioners should show caution when using estimates derived from either method under such conditions.

    more » « less
  4. Abstract

    Understanding how and why animals use the environments where they occur is both foundational to behavioral ecology and essential to identify critical habitats for species conservation. However, some behaviors are more difficult to observe than others, which can bias analyses of raw observational data. To our knowledge, no method currently exists to model how animals use different environments while accounting for imperfect behavior‐specific detection probability. We developed an extension of a binomial N‐mixture model (hereafter the behavior N‐mixture model) to estimate the probability of a given behavior occurring in a particular environment while accounting for imperfect detection. We then conducted a simulation to validate the model's ability to estimate the effects of environmental covariates on the probabilities of individuals performing different behaviors. We compared our model to a naïve model that does not account for imperfect detection, as well as a traditional N‐mixture model. Finally, we applied the model to a bird observation data set in northwest Costa Rica to quantify how three species behave in forests and farms. Simulations and sensitivity analyses demonstrated that the behavior N‐mixture model produced unbiased estimates of behaviors and their relationships with predictor variables (e.g., forest cover, habitat type). Importantly, the behavior N‐mixture model accurately characterized uncertainty, unlike the naïve model, which often suggested erroneous effects of covariates on behaviors. When applied to field data, the behavior N‐mixture model suggested that Hoffmann's woodpecker (Melanerpes hoffmanii) and Inca dove (Columbina inca) behaved differently in forested versus agricultural habitats, while turquoise‐browed motmot (Eumomota superciliosa) did not. Thus, the behavior N‐mixture model can help identify habitats that are essential to a species' life cycle (e.g., where individuals nest, forage) that nonbehavioral models would miss. Our model can greatly improve the appropriate use of behavioral survey data and conclusions drawn from them. In doing so, it provides a valuable path forward for assessing the conservation value of alternative habitat types.

    more » « less
  5. Abstract Aim

    Species distribution models (SDMs) are increasingly applied across macroscales using detection‐nondetection data. These models typically assume that a single set of regression coefficients can adequately describe species–environment relationships and/or population trends. However, such relationships often show nonlinear and/or spatially varying patterns that arise from complex interactions with abiotic and biotic processes that operate at different scales. Spatially varying coefficient (SVC) models can readily account for variability in the effects of environmental covariates. Yet, their use in ecology is relatively scarce due to gaps in understanding the inferential benefits that SVC models can provide compared to simpler frameworks.


    Here we demonstrate the inferential benefits of SVC SDMs, with a particular focus on how this approach can be used to generate and test ecological hypotheses regarding the drivers of spatial variability in population trends and species–environment relationships. We illustrate the inferential benefits of SVC SDMs with simulations and two case studies: one that assesses spatially varying trends of 51 forest bird species in the eastern United States over two decades and a second that evaluates spatial variability in the effects of five decades of land cover change on grasshopper sparrow (Ammodramus savannarum) occurrence across the continental United States.

    Main conclusions

    We found strong support for SVC SDMs compared to simpler alternatives in both empirical case studies. Factors operating at fine spatial scales, accounted for by the SVCs, were the primary divers of spatial variability in forest bird occurrence trends. Additionally, SVCs revealed complex species–habitat relationships with grassland and cropland area for grasshopper sparrow, providing nuanced insights into how future land use change may shape its distribution. These applications display the utility of SVC SDMs to help reveal the environmental factors that drive species distributions across both local and broad scales. We conclude by discussing the potential applications of SVC SDMs in ecology and conservation.

    more » « less