skip to main content


Title: On the Role of Spatial Clustering Algorithms in Building Species Distribution Models from Community Science Data
This paper discusses opportunities for developments in spatial clustering methods to help leverage broad scale community science data for building species distribution models (SDMs). SDMs are tools that inform the science and policy needed to mitigate the impacts of climate change on biodiversity. Community science data span spatial and temporal scales unachievable by expert surveys alone, but they lack the structure imposed in smaller scale studies to allow adjustments for observational biases. Spatial clustering approaches can construct the necessary structure after surveys have occurred, but more work is needed to ensure that they are effective for this purpose. In this proposal, we describe the role of spatial clustering for realizing the potential of large biodiversity datasets, how existing methods approach this problem, and ideas for future work.  more » « less
Award ID(s):
2046678
NSF-PAR ID:
10332683
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ICML 2021 Workshop: Tackling Climate Change with Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding the ranges of rare and endangered species is central to conserving biodiversity in the Anthropocene. Species distribution models (SDMs) have become a common and powerful tool for analyzing species–environment relationships across geographic space. Although evaluating the distribution of rare species is integral to their conservation, this can be difficult when limited distribution data are available. Community science platforms, such as iNaturalist, have emerged as alternative sources for species occurrence data. Although these observations are often thought to be of lower quality than those of natural history collections, they may have potential for improving SDMs for species with few occurrence records from collections. Here, we investigate the utility of iNaturalist data for developing SDMs for a rare high‐elevation plant,Telesonix jamesii. Because methods for modeling rare species are limited in the literature, five different modeling techniques were considered, including profile methods, statistical models, and machine learning algorithms. The inclusion of iNaturalist data doubled the number of usable records forT. jamesii.We found that a random forest (RF) model using ensemble training data performed the highest of any model (area under curve = 0.98). We then compared the performance of RF models that use only natural history training data and those that use a combination of natural history (herbarium specimens) and iNaturalist training data. All models heavily relied on climate data (mean temperature of driest quarter, and precipitation of the warmest quarter), indicating that this species is under threat as climate continues to change. Validation datasets affected model fits as well. Models using only herbarium data performed slightly poorer when evaluated with cross‐validation than when validated externally with iNaturalist data. This study can serve as a model for future SDM studies of species with similar data limitations.

     
    more » « less
  2. Abstract

    Road construction and paving bring socio-economic benefits to receiving regions but can also be drivers of deforestation and land cover change. Road infrastructure often increases migration and illegal economic activities, which in turn affect local hydrology, wildlife, vegetation structure and dynamics, and biodiversity. To evaluate the full breadth of impacts from a coupled natural-human systems perspective, information is needed over a sufficient timespan to include pre- and post-road paving conditions. In addition, the spatial scale should be appropriate to link local human activities and biophysical system components, while also allowing for upscaling to the regional scale. A database was developed for the tri-national frontier in the Southwestern Amazon, where the Inter-Oceanic Highway was constructed through an area of high biological value and cultural diversity. Extensive socio-economic surveys and botanical field work are combined with remote sensing and reanalysis data to provide a rich and unique database, suitable for coupled natural-human systems research.

     
    more » « less
  3. Abstract

    A core goal of the National Ecological Observatory Network (NEON) is to measure changes in biodiversity across the 30‐yr horizon of the network. In contrast to NEON’s extensive use of automated instruments to collect environmental data, NEON’s biodiversity surveys are almost entirely conducted using traditional human‐centric field methods. We believe that the combination of instrumentation for remote data collection and machine learning models to process such data represents an important opportunity for NEON to expand the scope, scale, and usability of its biodiversity data collection while potentially reducing long‐term costs. In this manuscript, we first review the current status of instrument‐based biodiversity surveys within the NEON project and previous research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites. We then survey methods that have been developed at other locations but could potentially be employed at NEON sites in future. Finally, we expand on these ideas in five case studies that we believe suggest particularly fruitful future paths for automated biodiversity measurement at NEON sites: acoustic recorders for sound‐producing taxa, camera traps for medium and large mammals, hydroacoustic and remote imagery for aquatic diversity, expanded remote and ground‐based measurements for plant biodiversity, and laboratory‐based imaging for physical specimens and samples in the NEON biorepository. Through its data science‐literate staff and user community, NEON has a unique role to play in supporting the growth of such automated biodiversity survey methods, as well as demonstrating their ability to help answer key ecological questions that cannot be answered at the more limited spatiotemporal scales of human‐driven surveys.

     
    more » « less
  4. null (Ed.)
    Abstract Biodiversity is rapidly changing due to changes in the climate and human related activities; thus, the accurate predictions of species composition and diversity are critical to developing conservation actions and management strategies. In this paper, using satellite remote sensing products as covariates, we constructed stacked species distribution models (S-SDMs) under a Bayesian framework to build next-generation biodiversity models. Model performance of these models was assessed using oak assemblages distributed across the continental United States obtained from the National Ecological Observatory Network (NEON). This study represents an attempt to evaluate the integrated predictions of biodiversity models—including assemblage diversity and composition—obtained by stacking next-generation SDMs. We found that applying constraints to assemblage predictions, such as using the probability ranking rule, does not improve biodiversity prediction models. Furthermore, we found that independent of the stacking procedure (bS-SDM versus pS-SDM versus cS-SDM), these kinds of next-generation biodiversity models do not accurately recover the observed species composition at the plot level or ecological-community scales (NEON plots are 400 m 2 ). However, these models do return reasonable predictions at macroecological scales, i.e., moderately to highly correct assignments of species identities at the scale of NEON sites (mean area ~ 27 km 2 ). Our results provide insights for advancing the accuracy of prediction of assemblage diversity and composition at different spatial scales globally. An important task for future studies is to evaluate the reliability of combining S-SDMs with direct detection of species using image spectroscopy to build a new generation of biodiversity models that accurately predict and monitor ecological assemblages through time and space. 
    more » « less
  5. Abstract

    It is a critical time to reflect on the National Ecological Observatory Network (NEON) science to date as well as envision what research can be done right now with NEON (and other) data and what training is needed to enable a diverse user community. NEON became fully operational in May 2019 and has pivoted from planning and construction to operation and maintenance. In this overview, the history of and foundational thinking around NEON are discussed. A framework of open science is described with a discussion of how NEON can be situated as part of a larger data constellation—across existing networks and different suites of ecological measurements and sensors. Next, a synthesis of early NEON science, based on >100 existing publications, funded proposal efforts, and emergent science at the very first NEON Science Summit (hosted by Earth Lab at the University of Colorado Boulder in October 2019) is provided. Key questions that the ecology community will address with NEON data in the next 10 yr are outlined, from understanding drivers of biodiversity across spatial and temporal scales to defining complex feedback mechanisms in human–environmental systems. Last, the essential elements needed to engage and support a diverse and inclusive NEON user community are highlighted: training resources and tools that are openly available, funding for broad community engagement initiatives, and a mechanism to share and advertise those opportunities. NEON users require both the skills to work with NEON data and the ecological or environmental science domain knowledge to understand and interpret them. This paper synthesizes early directions in the community’s use of NEON data, and opportunities for the next 10 yr of NEON operations in emergent science themes, open science best practices, education and training, and community building.

     
    more » « less