Determining the spatial distributions of species and communities is a key task in ecology and conservation efforts. Joint species distribution models are a fundamental tool in community ecology that use multi‐species detection–nondetection data to estimate species distributions and biodiversity metrics. The analysis of such data is complicated by residual correlations between species, imperfect detection, and spatial autocorrelation. While many methods exist to accommodate each of these complexities, there are few examples in the literature that address and explore all three complexities simultaneously. Here we developed a spatial factor multi‐species occupancy model to explicitly account for species correlations, imperfect detection, and spatial autocorrelation. The proposed model uses a spatial factor dimension reduction approach and Nearest Neighbor Gaussian Processes to ensure computational efficiency for data sets with both a large number of species (e.g., >100) and spatial locations (e.g., 100,000). We compared the proposed model performance to five alternative models, each addressing a subset of the three complexities. We implemented the proposed and alternative models in thespOccupancysoftware, designed to facilitate application via an accessible, well documented, and open‐source R package. Using simulations, we found that ignoring the three complexities when present leads to inferior model predictive performance, and the impacts of failing to account for one or more complexities will depend on the objectives of a given study. Using a case study on 98 bird species across the continental US, the spatial factor multi‐species occupancy model had the highest predictive performance among the alternative models. Our proposed framework, together with its implementation inspOccupancy, serves as a user‐friendly tool to understand spatial variation in species distributions and biodiversity while addressing common complexities in multi‐species detection–nondetection data.
more »
« less
Using machine learning to model nontraditional spatial dependence in occupancy data
Abstract Spatial models for occupancy data are used to estimate and map the true presence of a species, which may depend on biotic and abiotic factors as well as spatial autocorrelation. Traditionally researchers have accounted for spatial autocorrelation in occupancy data by using a correlated normally distributed site‐level random effect, which might be incapable of modeling nontraditional spatial dependence such as discontinuities and abrupt transitions. Machine learning approaches have the potential to model nontraditional spatial dependence, but these approaches do not account for observer errors such as false absences. By combining the flexibility of Bayesian hierarchal modeling and machine learning approaches, we present a general framework to model occupancy data that accounts for both traditional and nontraditional spatial dependence as well as false absences. We demonstrate our framework using six synthetic occupancy data sets and two real data sets. Our results demonstrate how to model both traditional and nontraditional spatial dependence in occupancy data, which enables a broader class of spatial occupancy models that can be used to improve predictive accuracy and model adequacy.
more »
« less
- Award ID(s):
- 1754491
- PAR ID:
- 10475472
- Publisher / Repository:
- Ecological Society of America
- Date Published:
- Journal Name:
- Ecology
- Volume:
- 103
- Issue:
- 2
- ISSN:
- 0012-9658
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
NA (Ed.)Abstract Site occupancy models (SOMs) are a common tool for studying the spatial ecology of wildlife. When observational data are collected using passive monitoring field methods, including camera traps or autonomous recorders, detections of animals may be temporally autocorrelated, leading to biased estimates and incorrectly quantified uncertainty. We presently lack clear guidance for understanding and mitigating the consequences of temporal autocorrelation when estimating occupancy models with camera trap data.We use simulations to explore when and how autocorrelation gives rise to biased or overconfident estimates of occupancy. We explore the impact of sampling design and biological conditions on model performance in the presence of autocorrelation, investigate the usefulness of several techniques for identifying and mitigating bias and compare performance of the SOM to a model that explicitly estimates autocorrelation. We also conduct a case study using detections of 22 North American mammals.We show that a join count goodness‐of‐fit test previously proposed for identifying clustered detections is effective for detecting autocorrelation across a range of conditions. We find that strong bias occurs in the estimated occupancy intercept when survey durations are short and detection rates are low. We provide a reference table for assessing the degree of bias to be expected under all conditions. We further find that discretizing data with larger windows decreases the magnitude of bias introduced by autocorrelation. In our case study, we find that detections of most species are autocorrelated and demonstrate how larger detection windows might mitigate the resulting bias.Our findings suggest that autocorrelation is likely widespread in camera trap data and that many previous studies of occupancy based on camera trap data may have systematically underestimated occupancy probabilities. Moving forward, we recommend that ecologists estimating occupancy from camera trap data use the join count goodness‐of‐fit test to determine whether autocorrelation is present in their data. If it is, SOMs should use large detection windows to mitigate bias and more accurately quantify uncertainty in occupancy model parameters. Ecologists should not use gaps between detection periods, which are ineffective at mitigating temporal structure in data and discard useful data.more » « less
-
Abstract Accurate and cost-effective quantification of the carbon cycle for agroecosystems at decision-relevant scales is critical to mitigating climate change and ensuring sustainable food production. However, conventional process-based or data-driven modeling approaches alone have large prediction uncertainties due to the complex biogeochemical processes to model and the lack of observations to constrain many key state and flux variables. Here we propose a Knowledge-Guided Machine Learning (KGML) framework that addresses the above challenges by integrating knowledge embedded in a process-based model, high-resolution remote sensing observations, and machine learning (ML) techniques. Using the U.S. Corn Belt as a testbed, we demonstrate that KGML can outperform conventional process-based and black-box ML models in quantifying carbon cycle dynamics. Our high-resolution approach quantitatively reveals 86% more spatial detail of soil organic carbon changes than conventional coarse-resolution approaches. Moreover, we outline a protocol for improving KGML via various paths, which can be generalized to develop hybrid models to better predict complex earth system dynamics.more » « less
-
Abstract Species distribution models (SDMs) have become increasingly popular for making ecological inferences, as well as predictions to inform conservation and management. In predictive modeling, practitioners often use correlative SDMs that only evaluate a single spatial scale and do not account for differences in life stages. These modeling decisions may limit the performance of SDMs beyond the study region or sampling period. Given the increasing desire to develop transferable SDMs, a robust framework is necessary that can account for known challenges of model transferability. Here, we propose a comparative framework to develop transferable SDMs, which was tested using satellite telemetry data from green turtles (Chelonia mydas). This framework is characterized by a set of steps comparing among different models based on (1) model algorithm (e.g., generalized linear model vs. Gaussian process regression) and formulation (e.g., correlative model vs. hybrid model), (2) spatial scale, and (3) accounting for life stage. SDMs were fitted as resource selection functions and trained on data from the Gulf of Mexico with bathymetric depth, net primary productivity, and sea surface temperature as covariates. Independent validation datasets from Brazil and Qatar were used to assess model transferability. A correlative SDM using a hierarchical Gaussian process regression (HGPR) algorithm exhibited greater transferability than a hybrid SDM using HGPR, as well as correlative and hybrid forms of hierarchical generalized linear models. Additionally, models that evaluated habitat selection at the finest spatial scale and that did not account for life stage proved to be the most transferable in this study. The comparative framework presented here may be applied to a variety of species, ecological datasets (e.g., presence‐only, presence‐absence, mark‐recapture), and modeling frameworks (e.g., resource selection functions, step selection functions, occupancy models) to generate transferable predictions of species–habitat associations. We expect that SDM predictions resulting from this comparative framework will be more informative management tools and may be used to more accurately assess climate change impacts on a wide array of taxa.more » « less
-
null (Ed.)Understanding spatial expressions and using them appropriately is necessary for seamless and natural human-machine interaction. However, capturing the semantics and appropriate usage of spatial prepositions is notoriously difficult, because of their vagueness and polysemy. Although modern data-driven approaches are good at capturing statistical regularities in the usage, they usually require substantial sample sizes, often do not generalize well to unseen instances and, most importantly, their structure is essentially opaque to analysis, which makes diagnosing problems and understanding their reasoning process difficult. In this work, we discuss our attempt at modeling spatial senses of prepositions in English using a combination of rule-based and statistical learning approaches. Each preposition model is implemented as a tree where each node computes certain intuitive relations associated with the preposition, with the root computing the final value of the prepositional relation itself. The models operate on a set of artificial 3D “room world” environments, designed in Blender, taking the scene itself as an input. We also discuss our annotation framework used to collect human judgments employed in the model training. Both our factored models and black-box baseline models perform quite well, but the factored models will enable reasoned explanations of spatial relation judgements.more » « less
An official website of the United States government

