skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Capturing long‐tailed individual tree diversity using an airborne imaging and a multi‐temporal hierarchical model
Abstract Measuring forest biodiversity using terrestrial surveys is expensive and can only capture common species abundance in large heterogeneous landscapes. In contrast, combining airborne imagery with computer vision can generate individual tree data at the scales of hundreds of thousands of trees. To train computer vision models, ground‐based species labels are combined with airborne reflectance data. Due to the difficulty of finding rare species in a large landscape, many classification models only include the most abundant species, leading to biased predictions at broad scales. For example, if only common species are used to train the model, this assumes that these samples are representative across the entire landscape. Extending classification models to include rare species requires targeted data collection and algorithmic improvements to overcome large data imbalances between dominant and rare taxa. We use a targeted sampling workflow to the Ordway Swisher Biological Station within the US National Ecological Observatory Network (NEON), where traditional forestry plots had identified six canopy tree species with more than 10 individuals at the site. Combining iterative model development with rare species sampling, we extend a training dataset to include 14 species. Using a multi‐temporal hierarchical model, we demonstrate the ability to include species predicted at <1% frequency in landscape without losing performance on the dominant species. The final model has over 75% accuracy for 14 species with improved rare species classification compared to 61% accuracy of a baseline deep learning model. After filtering out dead trees, we generate landscape species maps of individual crowns for over 670 000 individual trees. We find distinct patches of forest composed of rarer species at the full‐site scale, highlighting the importance of capturing species diversity in training data. We estimate the relative abundance of 14 species within the landscape and provide three measures of uncertainty to generate a range of counts for each species. For example, we estimate that the dominant species,Pinus palustrisaccounts for c. 28% of predicted stems, with models predicting a range of counts between 160 000 and 210 000 individuals. These maps provide the first estimates of canopy tree diversity within a NEON site to include rare species and provide a blueprint for capturing tree diversity using airborne computer vision at broad scales.  more » « less
Award ID(s):
1926542 1638720
PAR ID:
10419076
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Remote Sensing in Ecology and Conservation
Volume:
9
Issue:
5
ISSN:
2056-3485
Format(s):
Medium: X Size: p. 656-670
Size(s):
p. 656-670
Sponsoring Org:
National Science Foundation
More Like this
  1. Weinstein, Ben (Ed.)
    # Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## Manuscript Abstract The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change. ## Data Summary Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file. ## Prediction metadata *Geometry* A four pointed bounding box location in utm coordinates. *indiv_id* A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile.  *sci_name* The full latin name of predicted species aligned with NEON's taxonomic nomenclature.  *ens_score* The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model.  *bleaf_taxa* Highest predicted category for the broadleaf submodel *bleaf_score* The confidence score for the broadleaf taxa submodel  *oak_taxa* Highest predicted category for the oak model  *dead_label* A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead. *dead_score* The confidence score of the Alive/Dead prediction.  *site_id* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *conif_taxa* Highest predicted category for the conifer model *conif_score* The confidence score for the conifer taxa submodel *dom_taxa* Highest predicted category for the dominant taxa mode submodel *dom_score* The confidence score for the dominant taxa submodel ## Training data The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is __, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.Along with the crops are .csv files for various train-test split experiments for the manuscript. ### Crop metadata There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species. *score* This was the DeepForest crown score for the crop. *taxonID*For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists *individual*unique individual identifier for a given field record and crown crop *siteID*The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *plotID* NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height* The LiDAR derived height for the field sampling point. *image_path* Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight *tile_year*  Flight year of the sensor data *RGB_image_path* Relative pathname for the RGB array, can be read by rasterio.open() # Code repository The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttentionKey files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59). 
    more » « less
  2. Tanentzap, Andrew J (Ed.)
    The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales provides a unique perspective on forest ecosystems, forest restoration, and responses to disturbance. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual canopy tree species through labeled data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees across 24 sites in the National Ecological Observatory Network (NEON). Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1 km2shapefiles with individual tree species prediction, as well as crown location, crown area, and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of 6 species per site, ranging from 3 to 15 species per site. All predictions are openly archived and have been uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. We outline the potential utility and limitations of these data in ecology and computer vision research, as well as strategies for improving predictions using targeted data sampling. 
    more » « less
  3. null (Ed.)
    Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters. 
    more » « less
  4. Abstract AimRapid global change is impacting the diversity of tree species and essential ecosystem functions and services of forests. It is therefore critical to understand and predict how the diversity of tree species is spatially distributed within and among forest biomes. Satellite remote sensing platforms have been used for decades to map forest structure and function but are limited in their capacity to monitor change by their relatively coarse spatial resolution and the complexity of scales at which different dimensions of biodiversity are observed in the field. Recently, airborne remote sensing platforms making use of passive high spectral resolution (i.e., hyperspectral) and active lidar data have been operationalized, providing an opportunity to disentangle how biodiversity patterns vary across space and time from field observations to larger scales. Most studies to date have focused on single sites and/or one sensor type; here we ask how multiple sensor types from the National Ecological Observatory Network’s Airborne Observation Platform (NEON AOP) perform across multiple sites in a single biome at the NEON field plot scale (i.e., 40 m × 40 m). LocationEastern USA. Time period2017–2018. Taxa studiedTrees. MethodsWith a fusion of hyperspectral and lidar data from the NEON AOP, we assess the ability of high resolution remotely sensed metrics to measure biodiversity variation across eastern US temperate forests. We examine how taxonomic, functional, and phylogenetic measures of alpha diversity vary spatially and assess to what degree remotely sensed metrics correlate with in situ biodiversity metrics. ResultsModels using estimates of forest function, canopy structure, and topographic diversity performed better than models containing each category alone. Our results show that canopy structural diversity, and not just spectral reflectance, is critical to predicting biodiversity. Main conclusionsWe found that an approach that jointly leverages spectral properties related to leaf and canopy functional traits and forest health, lidar derived estimates of forest structure, fine‐resolution topographic diversity, and careful consideration of biogeographical differences within and among biomes is needed to accurately map biodiversity variation from above. 
    more » « less
  5. Abstract Plant functional diversity is strongly connected to photosynthetic carbon assimilation in terrestrial ecosystems. However, many of the plant functional traits that regulate photosynthetic capacity, including foliar nitrogen concentration and leaf mass per area, vary significantly between and within plant functional types and vertically through forest canopies, resulting in considerable landscape‐scale heterogeneity in three dimensions. Hyperspectral imagery has been used extensively to quantify functional traits across a range of ecosystems but is generally limited to providing information for top of canopy leaves only. On the other hand, lidar data can be used to retrieve the vertical structure of forest canopies. Because these data are rarely collected at the same time, there are unanswered questions about the effect of forest structure on the three ‐dimensional spatial patterns of functional traits across ecosystems. In the United States, the National Ecological Observatory Network's Airborne Observation Platform (NEON AOP) provides an opportunity to address this structure‐function relationship by collecting lidar and hyperspectral data together across a variety of ecoregions. With a fusion of hyperspectral and lidar data from the NEON AOP and field‐collected foliar trait data, we assessed the impacts of forest structure on spatial patterns of N. In addition, we examine the influence of abiotic gradients and management regimes on top‐of‐canopy percent N and total canopy N (i.e., the total amount of N [g/m2] within a forest canopy) at a NEON site consisting of a mosaic of open longleaf pine and dense broadleaf deciduous forests. Our resulting maps suggest that, in contrast to top of canopy values, total canopy N variation is dampened across this landscape resulting in relatively homogeneous spatial patterns. At the same time, we found that leaf functional diversity and canopy structural diversity showed distinct dendritic patterns related to the spatial distribution of plant functional types. 
    more » « less