skip to main content

Title: spOccupancy: An R package for single‐species, multi‐species, and integrated spatial occupancy models

Occupancy modelling is a common approach to assess species distribution patterns, while explicitly accounting for false absences in detection–nondetection data. Numerous extensions of the basic single‐species occupancy model exist to model multiple species, spatial autocorrelation and to integrate multiple data types. However, development of specialized and computationally efficient software to incorporate such extensions, especially for large datasets, is scarce or absent.

We introduce thespOccupancy Rpackage designed to fit single‐species and multi‐species spatially explicit occupancy models. We fit all models within a Bayesian framework using Pólya‐Gamma data augmentation, which results in fast and efficient inference.spOccupancyprovides functionality for data integration of multiple single‐species detection–nondetection datasets via a joint likelihood framework. The package leverages Nearest Neighbour Gaussian Processes to account for spatial autocorrelation, which enables spatially explicit occupancy modelling for potentially massive datasets (e.g. 1,000s–100,000s of sites).

spOccupancyprovides user‐friendly functions for data simulation, model fitting, model validation (by posterior predictive checks), model comparison (using information criteria and k‐fold cross‐validation) and out‐of‐sample prediction. We illustrate the package's functionality via a vignette, simulated data analysis and two bird case studies.

ThespOccupancypackage provides a user‐friendly platform to fit a variety of single and multi‐species occupancy models, making it straightforward to address detection biases and spatial autocorrelation in species distribution models even for large datasets.

more » « less
Award ID(s):
1916395 1954406
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
Methods in Ecology and Evolution
Page Range / eLocation ID:
p. 1670-1678
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Determining the spatial distributions of species and communities is a key task in ecology and conservation efforts. Joint species distribution models are a fundamental tool in community ecology that use multi‐species detection–nondetection data to estimate species distributions and biodiversity metrics. The analysis of such data is complicated by residual correlations between species, imperfect detection, and spatial autocorrelation. While many methods exist to accommodate each of these complexities, there are few examples in the literature that address and explore all three complexities simultaneously. Here we developed a spatial factor multi‐species occupancy model to explicitly account for species correlations, imperfect detection, and spatial autocorrelation. The proposed model uses a spatial factor dimension reduction approach and Nearest Neighbor Gaussian Processes to ensure computational efficiency for data sets with both a large number of species (e.g., >100) and spatial locations (e.g., 100,000). We compared the proposed model performance to five alternative models, each addressing a subset of the three complexities. We implemented the proposed and alternative models in thespOccupancysoftware, designed to facilitate application via an accessible, well documented, and open‐source R package. Using simulations, we found that ignoring the three complexities when present leads to inferior model predictive performance, and the impacts of failing to account for one or more complexities will depend on the objectives of a given study. Using a case study on 98 bird species across the continental US, the spatial factor multi‐species occupancy model had the highest predictive performance among the alternative models. Our proposed framework, together with its implementation inspOccupancy, serves as a user‐friendly tool to understand spatial variation in species distributions and biodiversity while addressing common complexities in multi‐species detection–nondetection data.

    more » « less
  2. Abstract

    Quantitative evaluations to optimize complexity have become standard for avoiding overfitting of ecological niche models (ENMs) that estimate species’ potential geographic distributions.ENMevalwas the first R package to make such evaluations (often termed model tuning) widely accessible for the Maxent algorithm. It also provided multiple methods for partitioning occurrence data and reported various performance metrics.

    Requests by users, recent developments in the field, and needs for software compatibility led to a major redesign and expansion. We additionally conducted a literature review to investigate trends inENMevaluse (2015–2019).

    ENMeval2.0 has a new object‐oriented structure for adding other algorithms, enables customizing algorithmic settings and performance metrics, generates extensive metadata, implements a null‐model approach to quantify significance and effect sizes, and includes features to increase the breadth of analyses and visualizations. In our literature review, we found insufficient reporting of model performance and parameterization, heavy reliance on model selection with AICc and low utilization of spatial cross‐validation; we explain howENMeval2.0 can help address these issues.

    This redesigned and expanded version can promote progress in the field and improve the information available for decision‐making.

    more » « less
  3. Abstract

    1. The occurrence and distributions of wildlife populations and communities are shifting as a result of global changes. To evaluate whether these shifts are negatively impacting biodiversity processes, it is critical to monitor the status, trends and effects of environmental variables on entire communities. However, modelling the dynamics of multiple species simultaneously can require large amounts of diverse data, and few modelling approaches exist to simultaneously provide species and community‐level inferences.

    2. We present an ‘integrated community occupancy model’ (ICOM) that unites principles of data integration and hierarchical community modelling in a single framework to provide inferences on species‐specific and community occurrence dynamics using multiple data sources. The ICOM combines replicated and nonreplicated detection–nondetection data sources using a hierarchical framework that explicitly accounts for different detection and sampling processes across data sources. We use simulations to compare the ICOM to previously developed hierarchical community occupancy models and single species integrated distribution models. We then apply our model to assess the occurrence and biodiversity dynamics of foliage‐gleaning birds in the White Mountain National Forest in the northeastern USA from 2010 to 2018 using three independent data sources.

    3. Simulations reveal that integrating multiple data sources in the ICOM increased precision and accuracy of species and community‐level inferences compared to single data source models, although benefits of integration were dependent on the information content of individual data sources (e.g. amount of replication). Compared to single species models, the ICOM yielded more precise species‐level estimates. Within our case study, the ICOM had the highest out‐of‐sample predictive performance compared to single species models and models that used only a subset of the three data sources.

    4. The ICOM provides more precise estimates of occurrence dynamics compared to multi‐species models using single data sources or integrated single‐species models. We further found that the ICOM had improved predictive performance across a broad region of interest with an empirical case study of forest birds. The ICOM offers an attractive approach to estimate species and biodiversity dynamics, which is additionally valuable to inform management objectives of both individual species and their broader communities.

    more » « less
  4. Abstract

    Projects focused on movement behaviour and home range are commonplace, but beyond a focus on choosing appropriate research questions, there are no clear guidelines for such studies. Without these guidelines, designing an animal tracking study to produce reliable estimates of space‐use and movement properties (necessary to answer basic movement ecology questions), is often done in an ad hoc manner.

    We developed ‘movedesign’, a user‐friendly Shiny application, which can be utilized to investigate the precision of three estimates regularly reported in movement and spatial ecology studies: home range area, speed and distance travelled. Conceptually similar to statistical power analysis, this application enables users to assess the degree of estimate precision that may be achieved with a given sampling design; that is, the choices regarding data resolution (sampling interval) and battery life (sampling duration).

    Leveraging the ‘ctmmRpackage, we utilize two methods proven to handle many common biases in animal movement datasets: autocorrelated kernel density estimators (AKDEs) and continuous‐time speed and distance (CTSD) estimators. Longer sampling durations are required to reliably estimate home range areas via the detection of a sufficient number of home range crossings. In contrast, speed and distance estimation requires a sampling interval short enough to ensure that a statistically significant signature of the animal's velocity remains in the data.

    This application addresses key challenges faced by researchers when designing tracking studies, including the trade‐off between long battery life and high resolution of GPS locations collected by the devices, which may result in a compromise between reliably estimating home range or speed and distance. ‘movedesign’ has broad applications for researchers and decision‐makers, supporting them to focus efforts and resources in achieving the optimal sampling design strategy for their research questions, prioritizing the correct deployment decisions for insightful and reliable outputs, while understanding the trade‐off associated with these choices.

    more » « less
  5. Abstract

    Landscape‐scale bioacoustic projects have become a popular approach to biodiversity monitoring. Combining passive acoustic monitoring recordings and automated detection provides an effective means of monitoring sound‐producing species' occupancy and phenology and can lend insight into unobserved behaviours and patterns. The availability of low‐cost recording hardware has lowered barriers to large‐scale data collection, but technological barriers in data analysis remain a bottleneck for extracting biological insight from bioacoustic datasets.

    We provide a robust and open‐source Python toolkit for detecting and localizing biological sounds in acoustic data.

    OpenSoundscape provides access to automated acoustic detection, classification and localization methods through a simple and easy‐to‐use set of tools. Extensive documentation and tutorials provide step‐by‐step instructions and examples of end‐to‐end analysis of bioacoustic data. Here, we describe the functionality of this package and provide concise examples of bioacoustic analyses with OpenSoundscape.

    By providing an interface for bioacoustic data and methods, we hope this package will lead to increased adoption of bioacoustics methods and ultimately to enhanced insights for ecology and conservation.

    more » « less