skip to main content

Title: Expanding NEON biodiversity surveys with new instrumentation and machine learning approaches

A core goal of the National Ecological Observatory Network (NEON) is to measure changes in biodiversity across the 30‐yr horizon of the network. In contrast to NEON’s extensive use of automated instruments to collect environmental data, NEON’s biodiversity surveys are almost entirely conducted using traditional human‐centric field methods. We believe that the combination of instrumentation for remote data collection and machine learning models to process such data represents an important opportunity for NEON to expand the scope, scale, and usability of its biodiversity data collection while potentially reducing long‐term costs. In this manuscript, we first review the current status of instrument‐based biodiversity surveys within the NEON project and previous research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites. We then survey methods that have been developed at other locations but could potentially be employed at NEON sites in future. Finally, we expand on these ideas in five case studies that we believe suggest particularly fruitful future paths for automated biodiversity measurement at NEON sites: acoustic recorders for sound‐producing taxa, camera traps for medium and large mammals, hydroacoustic and remote imagery for aquatic diversity, expanded remote and ground‐based measurements for plant biodiversity, and laboratory‐based imaging for physical specimens and samples in the NEON biorepository. Through its data science‐literate staff and user community, NEON has a unique role to play in supporting the growth of such automated biodiversity survey methods, as well as demonstrating their ability to help answer key ecological questions that cannot be answered at the more limited spatiotemporal scales of human‐driven surveys.

more » « less
Award ID(s):
1926542 1916896 1935507
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors.

    more » « less
  2. Abstract

    The NeonTreeCrowns dataset is a set of individual level crown estimates for 100 million trees at 37 geographic sites across the United States surveyed by the National Ecological Observation Network’s Airborne Observation Platform. Each rectangular bounding box crown prediction includes height, crown area, and spatial location. 

    How can I see the data?

    A web server to look through predictions is available through

    Dataset Organization

    The contains 11,000 shapefiles, each corresponding to a 1km^2 RGB tile from NEON (ID: DP3.30010.001). For example "2019_SOAP_4_302000_4100000_image.shp" are the predictions from "2019_SOAP_4_302000_4100000_image.tif" available from the NEON data portal: NEON's file convention refers to the year of data collection (2019), the four letter site code (SOAP), the sampling event (4), and the utm coordinate of the top left corner (302000_4100000). For NEON site abbreviations and utm zones see 

    The predictions are also available as a single csv for each file. All available tiles for that site and year are combined into one large site. These data are not projected, but contain the utm coordinates for each bounding box (left, bottom, right, top). For both file types the following fields are available:

    Height: The crown height measured in meters. Crown height is defined as the 99th quartile of all canopy height pixels from a LiDAR height model (ID: DP3.30015.001)

    Area: The crown area in m2 of the rectangular bounding box.

    Label: All data in this release are "Tree".

    Score: The confidence score from the DeepForest deep learning algorithm. The score ranges from 0 (low confidence) to 1 (high confidence)

    How were predictions made?

    The DeepForest algorithm is available as a python package: Predictions were overlaid on the LiDAR-derived canopy height model. Predictions with heights less than 3m were removed.

    How were predictions validated?

    Please see

    Weinstein, B. G., Marconi, S., Bohlman, S. A., Zare, A., & White, E. P. (2020). Cross-site learning in deep learning RGB tree crown detection. Ecological Informatics56, 101061.

    Weinstein, B., Marconi, S., Aubry-Kientz, M., Vincent, G., Senyondo, H., & White, E. (2020). DeepForest: A Python package for RGB deep learning tree crown delineation. bioRxiv.

    Weinstein, Ben G., et al. "Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks." Remote Sensing 11.11 (2019): 1309.

    Were any sites removed?

    Several sites were removed due to poor NEON data quality. GRSM and PUUM both had lower quality RGB data that made them unsuitable for prediction. NEON surveys are updated annually and we expect future flights to correct these errors. We removed the GUIL puerto rico site due to its very steep topography and poor sunangle during data collection. The DeepForest algorithm responded poorly to predicting crowns in intensely shaded areas where there was very little sun penetration. We are happy to make these data are available upon request.

    # Contact

    We welcome questions, ideas and general inquiries. The data can be used for many applications and we look forward to hearing from you. Contact 

    Gordon and Betty Moore Foundation: GBMF4563 
    more » « less
  3. null (Ed.)
    Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters. 
    more » « less
  4. Quantifying the resilience of ecological communities to increasingly frequent and severe environmental disturbance, such as natural disasters, requires long-term and continuous observations and a research community that is itself resilient. Investigators must have reliable access to data, a variety of resources to facilitate response to perturbation, and mechanisms for rapid and efficient return to function and/or adaptation to post-disaster conditions. There are always challenges to meeting these requirements, which may be compounded by multiple, co-occurring incidents. For example, travel restrictions resulting from the COVID-19 pandemic hindered preparations for, and responses to, environmental disasters that are the hallmarks of resilient research communities. During its initial years of data collection, a diversity of disturbances—earthquakes, wildfires, droughts, hurricanes and floods—have impacted sites at which the National Ecological Observatory Network (NEON) intends to measure organisms and environment for at least 30 years. These events strain both the natural and human communities associated with the Observatory, and additional stressors like public health crises only add to the burden. Here, we provide a case-study of how NEON has demonstrated not only internal resilience in the face of the public health crisis of COVID-19, but has also enhanced the resilience of ecological research communities associated with the network and provided crucial information for quantifying the impacts of and responses to disturbance events on natural systems—their ecological resilience. The key components discussed are: 1) NEON’s infrastructure and resources to support its core internal community, to adapt to rapidly changing situations, and to quickly resume operations following disruption, thus enabling the recovery of information flow crucial for data continuity; 2) how NEON data, tools, and materials are foundational in supporting the continuation of research programs in the face of challenges like those of COVID-19, thus enhancing the resilience of the greater ecological research community; and 3) the importance of diverse and consistent data for defining baseline and post-disaster conditions that are required to quantify the effects of natural disasters on ecosystem patterns and processes. 
    more » « less
  5. Abstract

    Understanding spatial and temporal variation in plant traits is needed to accurately predict how communities and ecosystems will respond to global change. The National Ecological Observatory Network’s (NEON’s) Airborne Observation Platform (AOP) provides hyperspectral images and associated data products at numerous field sites at 1 m spatial resolution, potentially allowing high‐resolution trait mapping. We tested the accuracy of readily available data products of NEON’s AOP, such as Leaf Area Index (LAI), Total Biomass, Ecosystem Structure (Canopy height model [CHM]), and Canopy Nitrogen, by comparing them to spatially extensive field measurements from a mesic tallgrass prairie. Correlations with AOP data products exhibited generally weak or no relationships with corresponding field measurements. The strongest relationships were between AOP LAI and ground‐measured LAI (r = 0.32) and AOP Total Biomass and ground‐measured biomass (r = 0.23). We also examined how well the full reflectance spectra (380–2,500 nm), as opposed to derived products, could predict vegetation traits using partial least‐squares regression (PLSR) models. Among all the eight traits examined, only Nitrogen had a validation of more than 0.25. For all vegetation traits, validation ranged from 0.08 to 0.29 and the range of the root mean square error of prediction (RMSEP) was 14–64%. Our results suggest that currently available AOP‐derived data products should not be used without extensive ground‐based validation. Relationships using the full reflectance spectra may be more promising, although careful consideration of field and AOP data mismatches in space and/or time, biases in field‐based measurements or AOP algorithms, and model uncertainty are needed. Finally, grassland sites may be especially challenging for airborne spectroscopy because of their high species diversity within a small area, mixed functional types of plant communities, and heterogeneous mosaics of disturbance and resource availability. Remote sensing observations are one of the most promising approaches to understanding ecological patterns across space and time. But the opportunity to engage a diverse community of NEON data users will depend on establishing rigorous links with in‐situ field measurements across a diversity of sites.

    more » « less