skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fusion neural networks for plant classification: learning to combine RGB, hyperspectral, and lidar data
Airborne remote sensing offers unprecedented opportunities to efficiently monitor vegetation, but methods to delineate and classify individual plant species using the collected data are still actively being developed and improved. The Integrating Data science with Trees and Remote Sensing (IDTReeS) plant identification competition openly invited scientists to create and compare individual tree mapping methods. Participants were tasked with training taxon identification algorithms based on two sites, to then transfer their methods to a third unseen site, using field-based plant observations in combination with airborne remote sensing image data products from the National Ecological Observatory Network (NEON). These data were captured by a high resolution digital camera sensitive to red, green, blue (RGB) light, hyperspectral imaging spectrometer spanning the visible to shortwave infrared wavelengths, and lidar systems to capture the spectral and structural properties of vegetation. As participants in the IDTReeS competition, we developed a two-stage deep learning approach to integrate NEON remote sensing data from all three sensors and classify individual plant species and genera. The first stage was a convolutional neural network that generates taxon probabilities from RGB images, and the second stage was a fusion neural network that “learns” how to combine these probabilities with hyperspectral and lidar data. Our two-stage approach leverages the ability of neural networks to flexibly and automatically extract descriptive features from complex image data with high dimensionality. Our method achieved an overall classification accuracy of 0.51 based on the training set, and 0.32 based on the test set which contained data from an unseen site with unknown taxa classes. Although transferability of classification algorithms to unseen sites with unknown species and genus classes proved to be a challenging task, developing methods with openly available NEON data that will be collected in a standardized format for 30 years allows for continual improvements and major gains for members of the computational ecology community. We outline promising directions related to data preparation and processing techniques for further investigation, and provide our code to contribute to open reproducible science efforts.  more » « less
Award ID(s):
1846384
PAR ID:
10317718
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
PeerJ
Volume:
9
ISSN:
2167-8359
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters. 
    more » « less
  2. The National Ecological Observatory Network (NEON) Airborne Observation Platform (AOP) provides long-term, quantitative information on land use, vegetation structure and canopy chemistry over the NEON sites. AOP flies a suite of integrated remote sensing instruments consisting of a hyperspectral imager, a waveform lidar, and a color digital camera. Small-footprint full-waveform airborne lidar provides an enhanced capability beyond discrete return lidar for capturing and characterizing canopy structure. Due to high data rates/volumes, a common practice is to truncate waveforms. Very little research exists to determine how much data should be saved. In this study, simulations are run in Rochester Institute of Technology’s DIRSIG software. The resulting output waveforms are analyzed to assess three lidar system requirements: the total number of bins with a detected signal, the number of segments, and the max number of bins in a single segment. Recommendations for the values of these requirements are provided. 
    more » « less
  3. Soil organic carbon (SOC) represents the largest terrestrial carbon pool. Effectively monitoring SOC at high spatial resolution is crucial for estimating carbon budgets at the ecosystem scale and informing climate change mitigation efforts at the regional scale. Traditional soil sampling methods, however, are laborious and expensive. Remote sensing platforms can be used to survey large landscapes to meet the need for rapid and cost-effective approaches for quantifying SOC at landscape to regional scales, if relationships between remotely sensed variables and SOC can be established. We developed a workflow to analyze and predict SOC content based on National Ecological Observatory Network (NEON) Airborne Observation Platform (AOP) remote sensing data. First, we benchmarked related tools and developed reproducible workflows using NEON remote sensing datasets. Hyperspectral data were extracted from the locations where NEON soil data exist. Additional variables from the LiDAR data and key metadata (climate and land cover) were extracted for those locations. Random Forest and Partial Least Squares Regression techniques were then used to create models for fine-scale SOC prediction. Cross-validation was embedded in the model creation step. The most important covariates were selected through recursive feature elimination, stepwise selection, and expert judgment. Preliminary results indicate that machine learning models can re-produce SOC measurements in testing datasets. Key predictors include topographic variables, vegetation indices, and specific wavelength bands in hyperspectral images. We are further validating our algorithms using SOC data from ISCN (International Soil Carbon Network) and SoDaH (SOils DAta Harmonization database) that are co-located with NEON sites. We are creating high-resolution SOC maps for 0-30 cm depth at NEON sites and testing our algorithms for different land use types. Our work paves the way for a broader assessment of SOC stocks using remote sensing observations, and our high-resolution SOC maps will potentially help quantify carbon budgets across heterogeneous landscapes. 
    more » « less
  4. Weinstein, Ben (Ed.)
    # Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## Manuscript Abstract The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change. ## Data Summary Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file. ## Prediction metadata *Geometry* A four pointed bounding box location in utm coordinates. *indiv_id* A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile.  *sci_name* The full latin name of predicted species aligned with NEON's taxonomic nomenclature.  *ens_score* The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model.  *bleaf_taxa* Highest predicted category for the broadleaf submodel *bleaf_score* The confidence score for the broadleaf taxa submodel  *oak_taxa* Highest predicted category for the oak model  *dead_label* A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead. *dead_score* The confidence score of the Alive/Dead prediction.  *site_id* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *conif_taxa* Highest predicted category for the conifer model *conif_score* The confidence score for the conifer taxa submodel *dom_taxa* Highest predicted category for the dominant taxa mode submodel *dom_score* The confidence score for the dominant taxa submodel ## Training data The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is __, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.Along with the crops are .csv files for various train-test split experiments for the manuscript. ### Crop metadata There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species. *score* This was the DeepForest crown score for the crop. *taxonID*For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists *individual*unique individual identifier for a given field record and crown crop *siteID*The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *plotID* NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height* The LiDAR derived height for the field sampling point. *image_path* Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight *tile_year*  Flight year of the sensor data *RGB_image_path* Relative pathname for the RGB array, can be read by rasterio.open() # Code repository The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttentionKey files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59). 
    more » « less
  5. {"Abstract":["Data provided by the Integrating Data science with Trees and Remote Sensing (IDTReeS) research group for use in the IDTReeS Competition.<\/p>\n\nGeospatial and tabular data to be used in two data science tasks focused on using remote sensing data to quantify the locations, sizes and species identities of millions of trees and on determining how these methods generalize to other forests.<\/p>\n\nVector data are the geographic extents of Individual Tree Crown boundaries that have been identified by researchers in the IDTReeS group. The data were generated primarily by Sarah Graves, Sergio Marconi, and Benjamin Weinstein, with support from Stephanie Bohlman, Ethan White, and members of the IDTReeS group.<\/p>\n\nRemote Sensing and Field data were generated by the National Ecological Observatory Network (NEON, Copyright © 2017 Battelle). Data were selected, downloaded, and packaged by Sergio Marconi. The most recent available data of the following products are provided:<\/p>\n\nNational Ecological Observatory Network. 2020. Data Product DP1.30010.001, High-resolution orthorectified camera imagery. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.<\/p>\n\nNational Ecological Observatory Network. 2020. Data Product DP1.30003.001, Discrete return LiDAR point cloud. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.<\/p>\n\nNational Ecological Observatory Network. 2020. Data Product DP1.10098.001, Woody plant vegetation structure. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.<\/p>\n\nNational Ecological Observatory Network. 2020. Data Product DP3.30015.001, Ecosystem structure. Provisional data downloaded from http://data.neonscience.org on March 4, 2020. Battelle, Boulder, CO, USA NEON. 2020.<\/p>\n\nNEON has the following data policy:<\/p>\n\n\u2018The National Ecological Observatory Network is a program sponsored by the National Science Foundation and operated under cooperative agreement by Battelle Memorial Institute. This material is based in part upon work supported by the National Science Foundation through the NEON Program.\u2019<\/p>\n\nTHE NEON DATA PRODUCTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE NEON DATA PRODUCTS BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE NEON DATA PRODUCTS.<\/p>"],"Other":["This data is supported by the National Science Foundation through grant 1926542 and by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through grant GBMF4563 to E.P. White, and the NSF Dimension of Biodiversity program grant (DEB-1442280) and USDA/NIFA McIntire-Stennis program (FLA-FOR-005470)."]} 
    more » « less