Citizen science biodiversity data present great opportunities for ecology and conservation across vast spatial and temporal scales. However, the opportunistic nature of these data lacks the sampling structure required by modeling methodologies that address a pervasive challenge in ecological data collection: imperfect detection, i.e., the likelihood of under-observing species on field surveys. Occupancy modeling is an example of an approach that accounts for imperfect detection by explicitly modeling the observation process separately from the biological process of habitat selection. This produces species distribution models that speak to the pattern of the species on a landscape after accounting for imperfect detection in the data, rather than the pattern of species observations corrupted by errors. To achieve this benefit, occupancy models require multiple surveys of a site across which the site's status (i.e., occupied or not) is assumed constant. Since citizen science data are not collected under the required repeated-visit protocol, observations may be grouped into sites post hoc. Existing approaches for constructing sites discard some observations and/or consider only geographic distance and not environmental similarity. In this study, we compare ten approaches for site construction in terms of their impact on downstream species distribution models for 31 bird species in Oregon, using observations recorded in the eBird database. We find that occupancy models built on sites constructed by spatial clustering algorithms perform better than existing alternatives.
more »
« less
Joint species distribution models with imperfect detection for high‐dimensional spatial data
Determining the spatial distributions of species and communities is a key task in ecology and conservation efforts. Joint species distribution models are a fundamental tool in community ecology that use multi‐species detection–nondetection data to estimate species distributions and biodiversity metrics. The analysis of such data is complicated by residual correlations between species, imperfect detection, and spatial autocorrelation. While many methods exist to accommodate each of these complexities, there are few examples in the literature that address and explore all three complexities simultaneously. Here we developed a spatial factor multi‐species occupancy model to explicitly account for species correlations, imperfect detection, and spatial autocorrelation. The proposed model uses a spatial factor dimension reduction approach and Nearest Neighbor Gaussian Processes to ensure computational efficiency for data sets with both a large number of species (e.g., >100) and spatial locations (e.g., 100,000). We compared the proposed model performance to five alternative models, each addressing a subset of the three complexities. We implemented the proposed and alternative models in thespOccupancysoftware, designed to facilitate application via an accessible, well documented, and open‐source R package. Using simulations, we found that ignoring the three complexities when present leads to inferior model predictive performance, and the impacts of failing to account for one or more complexities will depend on the objectives of a given study. Using a case study on 98 bird species across the continental US, the spatial factor multi‐species occupancy model had the highest predictive performance among the alternative models. Our proposed framework, together with its implementation inspOccupancy, serves as a user‐friendly tool to understand spatial variation in species distributions and biodiversity while addressing common complexities in multi‐species detection–nondetection data.
more »
« less
- PAR ID:
- 10477235
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Ecology
- Volume:
- 104
- Issue:
- 9
- ISSN:
- 0012-9658
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Numerous modelling techniques exist to estimate abundance of plant and animal populations. The most accurate methods account for multiple complexities found in ecological data, such as observational biases, spatial autocorrelation, and species correlations. There is, however, a lack of user‐friendly and computationally efficient software to implement the various models, particularly for large data sets.We developed thespAbundance Rpackage for fitting spatially explicit Bayesian single‐species and multi‐species hierarchical distance sampling models, N‐mixture models, and generalized linear mixed models. The models within the package can account for spatial autocorrelation using Nearest Neighbour Gaussian Processes and accommodate species correlations in multi‐species models using a latent factor approach, which enables model fitting for data sets with large numbers of sites and/or species.We provide three vignettes and three case studies that highlightspAbundancefunctionality. We used spatially explicit multi‐species distance sampling models to estimate density of 16 bird species in Florida, USA, an N‐mixture model to estimate black‐throated blue warbler (Setophaga caerulescens) abundance in New Hampshire, USA, and a spatial linear mixed model to estimate forest above‐ground biomass across the continental USA.spAbundanceprovides a user‐friendly, formula‐based interface to fit a variety of univariate and multivariate spatially explicit abundance models. The package serves as a useful tool for ecologists and conservation practitioners to generate improved inference and predictions on the spatial drivers of abundance in populations and communities.more » « less
-
NA (Ed.)Abstract Site occupancy models (SOMs) are a common tool for studying the spatial ecology of wildlife. When observational data are collected using passive monitoring field methods, including camera traps or autonomous recorders, detections of animals may be temporally autocorrelated, leading to biased estimates and incorrectly quantified uncertainty. We presently lack clear guidance for understanding and mitigating the consequences of temporal autocorrelation when estimating occupancy models with camera trap data.We use simulations to explore when and how autocorrelation gives rise to biased or overconfident estimates of occupancy. We explore the impact of sampling design and biological conditions on model performance in the presence of autocorrelation, investigate the usefulness of several techniques for identifying and mitigating bias and compare performance of the SOM to a model that explicitly estimates autocorrelation. We also conduct a case study using detections of 22 North American mammals.We show that a join count goodness‐of‐fit test previously proposed for identifying clustered detections is effective for detecting autocorrelation across a range of conditions. We find that strong bias occurs in the estimated occupancy intercept when survey durations are short and detection rates are low. We provide a reference table for assessing the degree of bias to be expected under all conditions. We further find that discretizing data with larger windows decreases the magnitude of bias introduced by autocorrelation. In our case study, we find that detections of most species are autocorrelated and demonstrate how larger detection windows might mitigate the resulting bias.Our findings suggest that autocorrelation is likely widespread in camera trap data and that many previous studies of occupancy based on camera trap data may have systematically underestimated occupancy probabilities. Moving forward, we recommend that ecologists estimating occupancy from camera trap data use the join count goodness‐of‐fit test to determine whether autocorrelation is present in their data. If it is, SOMs should use large detection windows to mitigate bias and more accurately quantify uncertainty in occupancy model parameters. Ecologists should not use gaps between detection periods, which are ineffective at mitigating temporal structure in data and discard useful data.more » « less
-
Abstract Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site‐level random effect, which might be incapable of modeling nontraditional spatial dependence such as discontinuities and abrupt transitions. Machine learning approaches have the potential to model nontraditional spatial dependence, but these approaches do not account for observer errors such as false absences. By combining the flexibility of Bayesian hierarchal modeling and machine learning approaches, we present a general framework to model occupancy data that accounts for both traditional and nontraditional spatial dependence as well as false absences. We demonstrate our framework using six synthetic occupancy data sets and two real data sets. Our results demonstrate how to model both traditional and nontraditional spatial dependence in occupancy data, which enables a broader class of spatial occupancy models that can be used to improve predictive accuracy and model adequacy.more » « less
-
Abstract The use of quantitative real-time PCR (qPCR) to monitor pathogens is common; however, quantitative frameworks that consider the observation process, dynamics in pathogen presence, and pathogen load are lacking. This can be problematic in the early stages of disease progression, where low level detections may be treated as ‘inconclusive’ and excluded from analyses. Alternatively, a framework that accounts for imperfect detection would provide more robust inferences. To better estimate pathogen dynamics, we developed a hierarchical multi-scale dynamic occupancy hurdle model (MS-DOHM). The model used data gathered during sampling forPseudogymnoascus destructans (Pd), the causative agent of white-nose syndrome, a fungal disease that has cause severe declines in several species of hibernating bats in North America. The model allowed us to estimate initial occupancy, colonization, persistence and prevalence ofPdat bat hibernacula. Additionally, utilizing the relationship between cycle threshold and pathogen load, we estimated pathogen detectability and modeled expected colony and bat pathogen loads. To assess the ability of MS-DOHM to estimate pathogen dynamics, we compared MS-DOHM’s results to those of a dynamic occupancy model and naïve detection/non-detection. MS-DOHM’s estimates of site-level pathogen presence were up to 11.9% higher than estimates from the dynamic occupancy model and 35.7% higher than naïve occupancy. Including prevalence and load in our modeling framework resulted in estimates of pathogen arrival that were two to three years earlier compared to the dynamic occupancy and naïve detection/non-detection, respectively. Compared to naïve values, MS-DOHM predicted greater pathogen loads on colonies; however, we found no difference between model estimates and naïve values of prevalence. While the model predicted no declines in site-level prevalence, there were instances where pathogen load decreased in colonies that had beenPdpositive for longer periods of time. Our findings demonstrate that accounting for pathogen load and prevalence at multiple scales changes our understanding ofPddynamics, potentially allowing earlier conservation intervention. Additionally, we found that accounting for pathogen load and prevalence within hibernacula and among individuals resulted in a better fitting model with greater predictive ability.more » « less
An official website of the United States government

