skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Role of Spatial Clustering Algorithms in Building Species Distribution Models from Community Science Data
This paper discusses opportunities for developments in spatial clustering methods to help leverage broad scale community science data for building species distribution models (SDMs). SDMs are tools that inform the science and policy needed to mitigate the impacts of climate change on biodiversity. Community science data span spatial and temporal scales unachievable by expert surveys alone, but they lack the structure imposed in smaller scale studies to allow adjustments for observational biases. Spatial clustering approaches can construct the necessary structure after surveys have occurred, but more work is needed to ensure that they are effective for this purpose. In this proposal, we describe the role of spatial clustering for realizing the potential of large biodiversity datasets, how existing methods approach this problem, and ideas for future work.  more » « less
Award ID(s):
2046678
PAR ID:
10332683
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ICML 2021 Workshop: Tackling Climate Change with Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Dainton, John (Ed.)
    Improving models of species' distributions is essential for conservation, especially in light of global change. Species distribution models (SDMs) often rely on mean environmental conditions, yet species distributions are also a function of environmental heterogeneity and filtering acting at multiple spatial scales. Geodiversity, which we define as the variation of abiotic features and processes of Earth's entire geosphere (inclusive of climate), has potential to improve SDMs and conservation assessments, as they capture multiple abiotic dimensions of species niches, however they have not been sufficiently tested in SDMs. We tested a range of geodiversity variables computed at varying scales using climate and elevation data. We compared predictive performance of MaxEnt SDMs generated using CHELSA bioclimatic variables to those also including geodiversity variables for 31 mammalian species in Colombia. Results show the spatial grain of geodiversity variables affects SDM performance. Some variables consistently exhibited an increasing or decreasing trend in variable importance with spatial grain, showing slight scale-dependence and indicating that some geodiversity variables are more relevant at particular scales for some species. Incorporating geodiversity variables into SDMs, and doing so at the appropriate spatial scales, enhances the ability to model species-environment relationships, thereby contributing to the conservation and management of biodiversity. This article is part of the Theo Murphy meeting issue ‘Geodiversity for science and society’. 
    more » « less
  2. Abstract A core goal of the National Ecological Observatory Network (NEON) is to measure changes in biodiversity across the 30‐yr horizon of the network. In contrast to NEON’s extensive use of automated instruments to collect environmental data, NEON’s biodiversity surveys are almost entirely conducted using traditional human‐centric field methods. We believe that the combination of instrumentation for remote data collection and machine learning models to process such data represents an important opportunity for NEON to expand the scope, scale, and usability of its biodiversity data collection while potentially reducing long‐term costs. In this manuscript, we first review the current status of instrument‐based biodiversity surveys within the NEON project and previous research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites. We then survey methods that have been developed at other locations but could potentially be employed at NEON sites in future. Finally, we expand on these ideas in five case studies that we believe suggest particularly fruitful future paths for automated biodiversity measurement at NEON sites: acoustic recorders for sound‐producing taxa, camera traps for medium and large mammals, hydroacoustic and remote imagery for aquatic diversity, expanded remote and ground‐based measurements for plant biodiversity, and laboratory‐based imaging for physical specimens and samples in the NEON biorepository. Through its data science‐literate staff and user community, NEON has a unique role to play in supporting the growth of such automated biodiversity survey methods, as well as demonstrating their ability to help answer key ecological questions that cannot be answered at the more limited spatiotemporal scales of human‐driven surveys. 
    more » « less
  3. Abstract Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations. 
    more » « less
  4. null (Ed.)
    Abstract Biodiversity is rapidly changing due to changes in the climate and human related activities; thus, the accurate predictions of species composition and diversity are critical to developing conservation actions and management strategies. In this paper, using satellite remote sensing products as covariates, we constructed stacked species distribution models (S-SDMs) under a Bayesian framework to build next-generation biodiversity models. Model performance of these models was assessed using oak assemblages distributed across the continental United States obtained from the National Ecological Observatory Network (NEON). This study represents an attempt to evaluate the integrated predictions of biodiversity models—including assemblage diversity and composition—obtained by stacking next-generation SDMs. We found that applying constraints to assemblage predictions, such as using the probability ranking rule, does not improve biodiversity prediction models. Furthermore, we found that independent of the stacking procedure (bS-SDM versus pS-SDM versus cS-SDM), these kinds of next-generation biodiversity models do not accurately recover the observed species composition at the plot level or ecological-community scales (NEON plots are 400 m 2 ). However, these models do return reasonable predictions at macroecological scales, i.e., moderately to highly correct assignments of species identities at the scale of NEON sites (mean area ~ 27 km 2 ). Our results provide insights for advancing the accuracy of prediction of assemblage diversity and composition at different spatial scales globally. An important task for future studies is to evaluate the reliability of combining S-SDMs with direct detection of species using image spectroscopy to build a new generation of biodiversity models that accurately predict and monitor ecological assemblages through time and space. 
    more » « less
  5. Climate change poses a threat to biodiversity, and it is unclear whether species can adapt to or tolerate new conditions, or migrate to areas with suitable habitats. Reconstructions of range shifts that occurred in response to environmental changes since the last glacial maximum (LGM) from species distribution models (SDMs) can provide useful data to inform conservation efforts. However, different SDM algorithms and climate reconstructions often produce contrasting patterns, and validation methods typically focus on accuracy in recreating current distributions, limiting their relevance for assessing predictions to the past or future. We modeled historically suitable habitat for the threatened North American tree green ashFraxinus pennsylvanicausing 24 SDMs built using two climate models, three calibration regions, and four modeling algorithms. We evaluated the SDMs using contemporary data with spatial block cross‐validation and compared the relative support for alternative models using a novel integrative method based on coupled demographic‐genetic simulations. We simulated genomic datasets using habitat suitability of each of the 24 SDMs in a spatially‐explicit model. Approximate Bayesian computation (ABC) was then used to evaluate the support for alternative SDMs through comparisons to an empirical population genomic dataset. Models had very similar performance when assessed with contemporary occurrences using spatial cross‐validation, but ABC model selection analyses consistently supported SDMs based on the CCSM climate model, an intermediate calibration extent, and the generalized linear modeling algorithm. Finally, we projected the future range of green ash under four climate change scenarios. Future projections using the SDMs selected via ABC suggest only minor shifts in suitable habitat for this species, while some of those that were rejected predicted dramatic changes. Our results highlight the different inferences that may result from the application of alternative distribution modeling algorithms and provide a novel approach for selecting among a set of competing SDMs with independent data. 
    more » « less