skip to main content


Title: Integrating National Ecological Observatory Network (NEON) Airborne Remote Sensing and In-Situ Data for Optimal Tree Species Classification
Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters.  more » « less
Award ID(s):
1846384
NSF-PAR ID:
10215482
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Remote Sensing
Volume:
12
Issue:
9
ISSN:
2072-4292
Page Range / eLocation ID:
1414
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Airborne remote sensing offers unprecedented opportunities to efficiently monitor vegetation, but methods to delineate and classify individual plant species using the collected data are still actively being developed and improved. The Integrating Data science with Trees and Remote Sensing (IDTReeS) plant identification competition openly invited scientists to create and compare individual tree mapping methods. Participants were tasked with training taxon identification algorithms based on two sites, to then transfer their methods to a third unseen site, using field-based plant observations in combination with airborne remote sensing image data products from the National Ecological Observatory Network (NEON). These data were captured by a high resolution digital camera sensitive to red, green, blue (RGB) light, hyperspectral imaging spectrometer spanning the visible to shortwave infrared wavelengths, and lidar systems to capture the spectral and structural properties of vegetation. As participants in the IDTReeS competition, we developed a two-stage deep learning approach to integrate NEON remote sensing data from all three sensors and classify individual plant species and genera. The first stage was a convolutional neural network that generates taxon probabilities from RGB images, and the second stage was a fusion neural network that “learns” how to combine these probabilities with hyperspectral and lidar data. Our two-stage approach leverages the ability of neural networks to flexibly and automatically extract descriptive features from complex image data with high dimensionality. Our method achieved an overall classification accuracy of 0.51 based on the training set, and 0.32 based on the test set which contained data from an unseen site with unknown taxa classes. Although transferability of classification algorithms to unseen sites with unknown species and genus classes proved to be a challenging task, developing methods with openly available NEON data that will be collected in a standardized format for 30 years allows for continual improvements and major gains for members of the computational ecology community. We outline promising directions related to data preparation and processing techniques for further investigation, and provide our code to contribute to open reproducible science efforts. 
    more » « less
  2. Abstract

    The NeonTreeCrowns dataset is a set of individual level crown estimates for 100 million trees at 37 geographic sites across the United States surveyed by the National Ecological Observation Network’s Airborne Observation Platform. Each rectangular bounding box crown prediction includes height, crown area, and spatial location. 

    How can I see the data?

    A web server to look through predictions is available through idtrees.org

    Dataset Organization

    The shapefiles.zip contains 11,000 shapefiles, each corresponding to a 1km^2 RGB tile from NEON (ID: DP3.30010.001). For example "2019_SOAP_4_302000_4100000_image.shp" are the predictions from "2019_SOAP_4_302000_4100000_image.tif" available from the NEON data portal: https://data.neonscience.org/data-products/explore?search=camera. NEON's file convention refers to the year of data collection (2019), the four letter site code (SOAP), the sampling event (4), and the utm coordinate of the top left corner (302000_4100000). For NEON site abbreviations and utm zones see https://www.neonscience.org/field-sites/field-sites-map. 

    The predictions are also available as a single csv for each file. All available tiles for that site and year are combined into one large site. These data are not projected, but contain the utm coordinates for each bounding box (left, bottom, right, top). For both file types the following fields are available:

    Height: The crown height measured in meters. Crown height is defined as the 99th quartile of all canopy height pixels from a LiDAR height model (ID: DP3.30015.001)

    Area: The crown area in m2 of the rectangular bounding box.

    Label: All data in this release are "Tree".

    Score: The confidence score from the DeepForest deep learning algorithm. The score ranges from 0 (low confidence) to 1 (high confidence)

    How were predictions made?

    The DeepForest algorithm is available as a python package: https://deepforest.readthedocs.io/. Predictions were overlaid on the LiDAR-derived canopy height model. Predictions with heights less than 3m were removed.

    How were predictions validated?

    Please see

    Weinstein, B. G., Marconi, S., Bohlman, S. A., Zare, A., & White, E. P. (2020). Cross-site learning in deep learning RGB tree crown detection. Ecological Informatics56, 101061.

    Weinstein, B., Marconi, S., Aubry-Kientz, M., Vincent, G., Senyondo, H., & White, E. (2020). DeepForest: A Python package for RGB deep learning tree crown delineation. bioRxiv.

    Weinstein, Ben G., et al. "Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks." Remote Sensing 11.11 (2019): 1309.

    Were any sites removed?

    Several sites were removed due to poor NEON data quality. GRSM and PUUM both had lower quality RGB data that made them unsuitable for prediction. NEON surveys are updated annually and we expect future flights to correct these errors. We removed the GUIL puerto rico site due to its very steep topography and poor sunangle during data collection. The DeepForest algorithm responded poorly to predicting crowns in intensely shaded areas where there was very little sun penetration. We are happy to make these data are available upon request.

    # Contact

    We welcome questions, ideas and general inquiries. The data can be used for many applications and we look forward to hearing from you. Contact ben.weinstein@weecology.org. 

    Gordon and Betty Moore Foundation: GBMF4563 
    more » « less
  3. Abstract Aim

    Rapid global change is impacting the diversity of tree species and essential ecosystem functions and services of forests. It is therefore critical to understand and predict how the diversity of tree species is spatially distributed within and among forest biomes. Satellite remote sensing platforms have been used for decades to map forest structure and function but are limited in their capacity to monitor change by their relatively coarse spatial resolution and the complexity of scales at which different dimensions of biodiversity are observed in the field. Recently, airborne remote sensing platforms making use of passive high spectral resolution (i.e., hyperspectral) and active lidar data have been operationalized, providing an opportunity to disentangle how biodiversity patterns vary across space and time from field observations to larger scales. Most studies to date have focused on single sites and/or one sensor type; here we ask how multiple sensor types from the National Ecological Observatory Network’s Airborne Observation Platform (NEON AOP) perform across multiple sites in a single biome at the NEON field plot scale (i.e., 40 m × 40 m).

    Location

    Eastern USA.

    Time period

    2017–2018.

    Taxa studied

    Trees.

    Methods

    With a fusion of hyperspectral and lidar data from the NEON AOP, we assess the ability of high resolution remotely sensed metrics to measure biodiversity variation across eastern US temperate forests. We examine how taxonomic, functional, and phylogenetic measures of alpha diversity vary spatially and assess to what degree remotely sensed metrics correlate with in situ biodiversity metrics.

    Results

    Models using estimates of forest function, canopy structure, and topographic diversity performed better than models containing each category alone. Our results show that canopy structural diversity, and not just spectral reflectance, is critical to predicting biodiversity.

    Main conclusions

    We found that an approach that jointly leverages spectral properties related to leaf and canopy functional traits and forest health, lidar derived estimates of forest structure, fine‐resolution topographic diversity, and careful consideration of biogeographical differences within and among biomes is needed to accurately map biodiversity variation from above.

     
    more » « less
  4. Data provided by the Integrating Data science with Trees and Remote Sensing (IDTReeS) research group for use in the IDTReeS Competition.

    Geospatial and tabular data to be used in two data science tasks focused on using remote sensing data to quantify the locations, sizes and species identities of millions of trees and on determining how these methods generalize to other forests.

    Vector data are the geographic extents of Individual Tree Crown boundaries that have been identified by researchers in the IDTReeS group. The data were generated primarily by Sarah Graves, Sergio Marconi, and Benjamin Weinstein, with support from Stephanie Bohlman, Ethan White, and members of the IDTReeS group.

    Remote Sensing and Field data were generated by the National Ecological Observatory Network (NEON, Copyright © 2017 Battelle). Data were selected, downloaded, and packaged by Sergio Marconi. The most recent available data of the following products are provided:

    National Ecological Observatory Network. 2020. Data Product DP1.30010.001, High-resolution orthorectified camera imagery. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.

    National Ecological Observatory Network. 2020. Data Product DP1.30003.001, Discrete return LiDAR point cloud. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.

    National Ecological Observatory Network. 2020. Data Product DP1.10098.001, Woody plant vegetation structure. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.

    National Ecological Observatory Network. 2020. Data Product DP3.30015.001, Ecosystem structure. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.

    NEON has the following data policy:

    ‘The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.’

    THE NEON DATA PRODUCTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE NEON DATA PRODUCTS BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE NEON DATA PRODUCTS.

    This data is supported by the National Science Foundation through grant 1926542 and by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through grant GBMF4563 to E.P. White, and the NSF Dimension of Biodiversity program grant (DEB-1442280) and USDA/NIFA McIntire-Stennis program (FLA-FOR-005470). 
    more » « less
  5. Abstract

    Measuring forest biodiversity using terrestrial surveys is expensive and can only capture common species abundance in large heterogeneous landscapes. In contrast, combining airborne imagery with computer vision can generate individual tree data at the scales of hundreds of thousands of trees. To train computer vision models, ground‐based species labels are combined with airborne reflectance data. Due to the difficulty of finding rare species in a large landscape, many classification models only include the most abundant species, leading to biased predictions at broad scales. For example, if only common species are used to train the model, this assumes that these samples are representative across the entire landscape. Extending classification models to include rare species requires targeted data collection and algorithmic improvements to overcome large data imbalances between dominant and rare taxa. We use a targeted sampling workflow to the Ordway Swisher Biological Station within the US National Ecological Observatory Network (NEON), where traditional forestry plots had identified six canopy tree species with more than 10 individuals at the site. Combining iterative model development with rare species sampling, we extend a training dataset to include 14 species. Using a multi‐temporal hierarchical model, we demonstrate the ability to include species predicted at <1% frequency in landscape without losing performance on the dominant species. The final model has over 75% accuracy for 14 species with improved rare species classification compared to 61% accuracy of a baseline deep learning model. After filtering out dead trees, we generate landscape species maps of individual crowns for over 670 000 individual trees. We find distinct patches of forest composed of rarer species at the full‐site scale, highlighting the importance of capturing species diversity in training data. We estimate the relative abundance of 14 species within the landscape and provide three measures of uncertainty to generate a range of counts for each species. For example, we estimate that the dominant species,Pinus palustrisaccounts for c. 28% of predicted stems, with models predicting a range of counts between 160 000 and 210 000 individuals. These maps provide the first estimates of canopy tree diversity within a NEON site to include rare species and provide a blueprint for capturing tree diversity using airborne computer vision at broad scales.

     
    more » « less