skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A remote sensing derived data set of 100 million individual tree crowns for the National Ecological Observatory Network
Forests provide biodiversity, ecosystem, and economic services. Information on individual trees is important for understanding forest ecosystems but obtaining individual-level data at broad scales is challenging due to the costs and logistics of data collection. While advances in remote sensing techniques allow surveys of individual trees at unprecedented extents, there remain technical challenges in turning sensor data into tangible information. Using deep learning methods, we produced an open-source data set of individual-level crown estimates for 100 million trees at 37 sites across the United States surveyed by the National Ecological Observatory Network’s Airborne Observation Platform. Each canopy tree crown is represented by a rectangular bounding box and includes information on the height, crown area, and spatial location of the tree. These data have the potential to drive significant expansion of individual-level research on trees by facilitating both regional analyses and cross-region comparisons encompassing forest types from most of the United States.  more » « less
Award ID(s):
1926542
PAR ID:
10292078
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
eLife
Volume:
10
ISSN:
2050-084X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tanentzap, Andrew J (Ed.)
    The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales provides a unique perspective on forest ecosystems, forest restoration, and responses to disturbance. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual canopy tree species through labeled data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees across 24 sites in the National Ecological Observatory Network (NEON). Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1 km2shapefiles with individual tree species prediction, as well as crown location, crown area, and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of 6 species per site, ranging from 3 to 15 species per site. All predictions are openly archived and have been uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. We outline the potential utility and limitations of these data in ecology and computer vision research, as well as strategies for improving predictions using targeted data sampling. 
    more » « less
  2. The ability to automatically delineate individual tree crowns using remote sensing data opens the possibility to collect detailed tree information over large geographic regions. While individual tree crown delineation (ITCD) methods have proven successful in conifer-dominated forests using Light Detection and Ranging (LiDAR) data, it remains unclear how well these methods can be applied in deciduous broadleaf-dominated forests. We applied five automated LiDAR-based ITCD methods across fifteen plots ranging from conifer- to broadleaf-dominated forest stands at Harvard Forest in Petersham, MA, USA, and assessed accuracy against manual delineation of crowns from unmanned aerial vehicle (UAV) imagery. We then identified tree- and plot-level factors influencing the success of automated delineation techniques. There was relatively little difference in accuracy between automated crown delineation methods (51–59% aggregated plot accuracy) and, despite parameter tuning, none of the methods produced high accuracy across all plots (27—90% range in plot-level accuracy). The accuracy of all methods was significantly higher with increased plot conifer fraction, and individual conifer trees were identified with higher accuracy (mean 64%) than broadleaf trees (42%) across methods. Further, while tree-level factors (e.g., diameter at breast height, height and crown area) strongly influenced the success of crown delineations, the influence of plot-level factors varied. The most important plot-level factor was species evenness, a metric of relative species abundance that is related to both conifer fraction and the degree to which trees can fill canopy space. As species evenness decreased (e.g., high conifer fraction and less efficient filling of canopy space), the probability of successful delineation increased. Overall, our work suggests that the tested LiDAR-based ITCD methods perform equally well in a mixed temperate forest, but that delineation success is driven by forest characteristics like functional group, tree size, diversity, and crown architecture. While LiDAR-based ITCD methods are well suited for stands with distinct canopy structure, we suggest that future work explore the integration of phenology and spectral characteristics with existing LiDAR as an approach to improve crown delineation in broadleaf-dominated stands. 
    more » « less
  3. Weinstein, Ben (Ed.)
    # Individual Tree Predictions for 100 million trees in the National Ecological Observatory Network Preprint: https://www.biorxiv.org/content/10.1101/2023.10.25.563626v1 ## Manuscript Abstract The ecology of forest ecosystems depends on the composition of trees. Capturing fine-grained information on individual trees at broad scales allows an unprecedented view of forest ecosystems, forest restoration and responses to disturbance. To create detailed maps of tree species, airborne remote sensing can cover areas containing millions of trees at high spatial resolution. Individual tree data at wide extents promises to increase the scale of forest analysis, biogeographic research, and ecosystem monitoring without losing details on individual species composition and abundance. Computer vision using deep neural networks can convert raw sensor data into predictions of individual tree species using ground truthed data collected by field researchers. Using over 40,000 individual tree stems as training data, we create landscape-level species predictions for over 100 million individual trees for 24 sites in the National Ecological Observatory Network. Using hierarchical multi-temporal models fine-tuned for each geographic area, we produce open-source data available as 1km^2 shapefiles with individual tree species prediction, as well as crown location, crown area and height of 81 canopy tree species. Site-specific models had an average performance of 79% accuracy covering an average of six species per site, ranging from 3 to 15 species. All predictions were uploaded to Google Earth Engine to benefit the ecology community and overlay with other remote sensing assets. These data can be used to study forest macro-ecology, functional ecology, and responses to anthropogenic change. ## Data Summary Each NEON site is a single zip archive with tree predictions for all available data. For site abbreviations see: https://www.neonscience.org/field-sites/explore-field-sites. For each site, there is a .zip and .csv. The .zip is a set 1km .shp tiles. The .csv is all trees in a single file. ## Prediction metadata *Geometry* A four pointed bounding box location in utm coordinates. *indiv_id* A unique crown identifier that combines the year, site and geoindex of the NEON airborne tile (e.g. 732000_4707000) is the utm coordinate of the top left of the tile.  *sci_name* The full latin name of predicted species aligned with NEON's taxonomic nomenclature.  *ens_score* The confidence score of the species prediction. This score is the output of the multi-temporal model for the ensemble hierarchical model.  *bleaf_taxa* Highest predicted category for the broadleaf submodel *bleaf_score* The confidence score for the broadleaf taxa submodel  *oak_taxa* Highest predicted category for the oak model  *dead_label* A two class alive/dead classification based on the RGB data. 0=Alive/1=Dead. *dead_score* The confidence score of the Alive/Dead prediction.  *site_id* The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *conif_taxa* Highest predicted category for the conifer model *conif_score* The confidence score for the conifer taxa submodel *dom_taxa* Highest predicted category for the dominant taxa mode submodel *dom_score* The confidence score for the dominant taxa submodel ## Training data The crops.zip contains pre-cropped files. 369 band hyperspectral files are numpy arrays. RGB crops are .tif files. Naming format is __, for example. "NEON.PLA.D07.GRSM.00583_2022_RGB.tif" is RGB crop of the predicted crown of NEON data from Great Smoky Mountain National Park (GRSM), flown in 2022.Along with the crops are .csv files for various train-test split experiments for the manuscript. ### Crop metadata There are 30,042 individuals in the annotations.csv file. We keep all data, but we recommend a filtering step of atleast 20 records per species to reduce chance of taxonomic or data cleaning errors. This leaves 132 species. *score* This was the DeepForest crown score for the crop. *taxonID*For letter species code, see NEON plant taxonomy for scientific name: https://data.neonscience.org/taxonomic-lists *individual*unique individual identifier for a given field record and crown crop *siteID*The four letter code for the NEON site. See https://www.neonscience.org/field-sites/explore-field-sites for site locations. *plotID* NEON plot ID within the site. For more information on NEON sampling see: https://www.neonscience.org/data-samples/data-collection/observational-sampling/site-level-sampling-design *CHM_height* The LiDAR derived height for the field sampling point. *image_path* Relative pathname for the hyperspectral array, can be read by numpy.load -> format of 369 bands * Height * Weight *tile_year*  Flight year of the sensor data *RGB_image_path* Relative pathname for the RGB array, can be read by rasterio.open() # Code repository The predictions were made using the DeepTreeAttention repo: https://github.com/weecology/DeepTreeAttentionKey files include model definition for a [single year model](https://github.com/weecology/DeepTreeAttention/blob/main/src/models/Hang2020.py) and [Data preprocessing](https://github.com/weecology/DeepTreeAttention/blob/cae13f1e4271b5386e2379068f8239de3033ec40/src/utils.py#L59). 
    more » « less
  4. null (Ed.)
    Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters. 
    more » « less
  5. Foliar chemistry values were obtained from two important native tree species (white oak (Quercus alba L.) and red maple (Acer rubrum L.)) across urban and reference forest sites of three major cities in the eastern United States during summer 2015 (New York, NY (NYC); Philadelphia, PA; and Baltimore, MD). Trees were selected from secondary growth oak-hickory forests found in New York, NY; Philadelphia, PA; and Baltimore, MD, as well as at reference forest sites outside each metropolitan area. In all three metropolitan areas, urban forest patches and references forest sites were selected based on the presence of red maple and white oak canopy dominant trees in patches of at least 1.5 hectares with slopes less than 25%, and well-drained soils of similar soil series within each metropolitan area. Within each city, several forest patches were selected to capture the variation in forest patch site conditions across an individual city. All reference sites were located in protected areas outside of the city and within intermix wildland-urban interface landscapes, in order to target similar contexts of surrounding land use and population density (Martinuzzi et al. 2015). Several reference sites were selected for each city, located within the same protected area considered representative of rural forests of the region. White oaks were at least 38.1 cm diameter at breast height (DBH), red maples were at least 25.4 cm DBH, and all trees were dominant or co-dominant canopy trees. The trees had no major trunk cavities and had crown vigor scores of 1 or 2 (less than 25% overall canopy damage; Pontius & Hallett 2014). From early July to early August 2015, sun leaves were collected from the periphery of the crown of each tree with either a shotgun or slingshot for subsequent analysis to determine differences in foliar chemistry across cities and urban vs. reference forest site types. The data were used to invstigate whether differences in native tree physiology occur between urban and reference forest patches, and whether those differences are site- and species-specific. A complete analysis of these data can be found in: Sonti, NF. 2019. Ecophysiological and social functions of urban forest patches. Ph.D. dissertation. University of Maryland, College Park, MD. 166 p. References: Martinuzzi S, Stewart SI, Helmers DP, Mockrin MH, Hammer RB, Radeloff VC. 2015. The 2010 wildland-urban interface of the conterminous United States. Research Map NRS-8. US Department of Agriculture, Forest Service, Northern Research Station: Newtown Square, PA. Pontius J, Hallett R. 2014. Comprehensive methods for earlier detection and monitoring of forest decline. Forest Science 60(6): 1156-1163. 
    more » « less