skip to main content


Title: StatEcoNet: Statistical Ecology Neural Networks for Species Distribution Modeling
This paper focuses on a core task in computational sustainability and statistical ecology: species distribution modeling (SDM). In SDM, the occurrence pattern of a species on a landscape is predicted by environmental features based on observations at a set of locations. At first, SDM may appear to be a binary classification problem, and one might be inclined to employ classic tools (e.g., logistic regression, support vector machines, neural networks) to tackle it. However, wildlife surveys introduce structured noise (especially under-counting) in the species observations. If unaccounted for, these observation errors systematically bias SDMs. To address the unique challenges of SDM, this paper proposes a framework called StatEcoNet. Specifically, this work employs a graphical generative model in statistical ecology to serve as the skeleton of the proposed computational framework and carefully integrates neural networks under the framework. The advantages of StatEcoNet over related approaches are demonstrated on simulated datasets as well as bird species data. Since SDMs are critical tools for ecological science and natural resource management, StatEcoNet may offer boosted computational and analytical powers to a wide range of applications that have significant social impacts, e.g., the study and conservation of threatened species.  more » « less
Award ID(s):
1910118
NSF-PAR ID:
10294853
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
35
Issue:
1
Page Range / eLocation ID:
513-521
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Species distribution models (SDMs), which relate recorded observations (presences) and absences or background points to environmental characteristics, are powerful tools used to generate hypotheses about the biogeography, ecology, and conservation of species. Although many researchers have examined the effects of presence and background point distributions on model outputs, they have not systematically evaluated the effects of various methods of background point sampling on the performance of a single model algorithm across many species. Therefore, a consensus on the preferred methods of background point sampling is lacking. Here, we conducted presence-background SDMs for 20 vertebrate species in North America under a variety of background point conditions, varying the number of background points used, the size of the buffer used to constrain the background points around the occurrences, and the percentage of background points sampled within the buffer (“spatial weighting”). We evaluated the accuracy and transferability of the models using Boyce index, overlap with expert-generated range maps, and area overpredicted and underpredicted by the SDM (and AUC for comparability with other studies). SDM performance is highly dependent on the species modelled but is affected by the number and spread of background points. Models with little spatial weighting had high accuracy (overlap values), but extreme extrapolation errors and overprediction. In contrast, SDMs with high transferability (high Boyce index values and low overprediction) had moderate-to-high spatial weighting. These results emphasize the importance of both background points and evaluation metric selection in SDMs. For other, more successful metrics, using many background points with spatial weighting may be preferred for models with large extents. These results can assist researchers in selecting the background point parameters most relevant for their research question, allowing them to fine-tune their hypotheses on the distribution of species through space and time. 
    more » « less
  2. Abstract

    Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations.

     
    more » « less
  3. Abstract

    Conservation planning and decision‐making rely on evaluations of biodiversity status and threats that are based upon species' distribution estimates. However, gaps exist regarding automated tools to delineate species' current ranges from distribution estimates and use those estimates to calculate both species‐ and community‐level biodiversity metrics. Here, we introduce changeRangeR, an R package that facilitates workflows to reproducibly transform estimates of species' distributions into metrics relevant for conservation. For example, by combining predictions from species distribution models (SDMs) with other maps of environmental data (e.g., suitable forest cover), researchers can characterize the proportion of a species' range that is under protection, metrics used under the IUCN Criteria A and B guidelines (Area of Occupancy and Extent of Occurrence), and other more general metrics such as taxonomic and phylogenetic diversity and endemism. Further, changeRangeR facilitates temporal comparisons among biodiversity metrics to inform efforts toward complementarity and consideration of future scenarios in conservation decisions. changeRangeR also provides tools to determine the effects of modeling decisions through sensitivity tests. Transparent and repeatable workflows for calculating biodiversity change metrics from SDMs such as those provided by changeRangeR are essential to inform conservation decision‐making efforts and represent key extensions for SDM methodology and associated metadata documentation.

     
    more » « less
  4. null (Ed.)
    Abstract Biodiversity is rapidly changing due to changes in the climate and human related activities; thus, the accurate predictions of species composition and diversity are critical to developing conservation actions and management strategies. In this paper, using satellite remote sensing products as covariates, we constructed stacked species distribution models (S-SDMs) under a Bayesian framework to build next-generation biodiversity models. Model performance of these models was assessed using oak assemblages distributed across the continental United States obtained from the National Ecological Observatory Network (NEON). This study represents an attempt to evaluate the integrated predictions of biodiversity models—including assemblage diversity and composition—obtained by stacking next-generation SDMs. We found that applying constraints to assemblage predictions, such as using the probability ranking rule, does not improve biodiversity prediction models. Furthermore, we found that independent of the stacking procedure (bS-SDM versus pS-SDM versus cS-SDM), these kinds of next-generation biodiversity models do not accurately recover the observed species composition at the plot level or ecological-community scales (NEON plots are 400 m 2 ). However, these models do return reasonable predictions at macroecological scales, i.e., moderately to highly correct assignments of species identities at the scale of NEON sites (mean area ~ 27 km 2 ). Our results provide insights for advancing the accuracy of prediction of assemblage diversity and composition at different spatial scales globally. An important task for future studies is to evaluate the reliability of combining S-SDMs with direct detection of species using image spectroscopy to build a new generation of biodiversity models that accurately predict and monitor ecological assemblages through time and space. 
    more » « less
  5. Abstract Aim

    Species distribution models (SDMs) are ubiquitous in ecology to predict species occurrence throughout their range. Typically, SDMs are created using presence‐only or presence–absence data. We hypothesize that the continuous metric of temporal occupancy, the proportion of time a species is observed at a given site, provides more detail about species occurrence than binary presence‐based SDMs.

    Location

    North America.

    Methods

    We compared SDMs for 189 focal species using four modelling methods to determine whether North American avian species distributions are better predicted using temporal occupancy over presence–absence. We used the North American Breeding Bird Survey and built SDMs based on all sites sampled consecutively between 2001 and 2015, as well as on a subset of only five time points within the 15‐year sampling window. Each model used the same environmental inputs to predict species range. Each SDM was cross‐validated temporally and spatially.

    Results

    Species distributions were generally better predicted using temporal occupancy rather than presence–absence when using either a five‐year or fifteen‐year sampling window. Species that occurred in a smaller proportion of their predicted range were particularly better predicted with SDMs using temporal occupancy. Temporal occupancy SDMs had lower false discovery and false‐positive rates but higher false‐negative rates than presence–absence models.

    Main conclusions

    Temporal occupancy is a valuable metric that can improve predictions of species occurrence for birds and may improve conservation planning and design efforts.

     
    more » « less