skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 1, 2026

Title: Recommendations for developing, documenting, and distributing data products derived from NEON data
Abstract The National Ecological Observatory Network (NEON) provides over 180 distinct data products from 81 sites (47 terrestrial and 34 freshwater aquatic sites) within the United States and Puerto Rico. These data products include both field and remote sensing data collected using standardized protocols and sampling schema, with centralized quality assurance and quality control (QA/QC) provided by NEON staff. Such breadth of data creates opportunities for the research community to extend basic and applied research while also extending the impact and reach of NEON data through the creation of derived data products—higher level data products derived by the user community from NEON data. Derived data products are curated, documented, reproducibly‐generated datasets created by applying various processing steps to one or more lower level data products—including interpolation, extrapolation, integration, statistical analysis, modeling, or transformations. Derived data products directly benefit the research community and increase the impact of NEON data by broadening the size and diversity of the user base, decreasing the time and effort needed for working with NEON data, providing primary research foci through the development via the derivation process, and helping users address multidisciplinary questions. Creating derived data products also promotes personal career advancement to those involved through publications, citations, and future grant proposals. However, the creation of derived data products is a nontrivial task. Here we provide an overview of the process of creating derived data products while outlining the advantages, challenges, and major considerations.  more » « less
Award ID(s):
2217817
PAR ID:
10586145
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Wiley Periodicals LLC
Date Published:
Journal Name:
Ecosphere
Volume:
16
Issue:
1
ISSN:
2150-8925
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Understanding patterns and drivers of species distribution and abundance, and thus biodiversity, is a core goal of ecology. Despite advances in recent decades, research into these patterns and processes is currently limited by a lack of standardized, high‐quality, empirical data that span large spatial scales and long time periods. The NEON fills this gap by providing freely available observational data that are generated during robust and consistent organismal sampling of several sentinel taxonomic groups within 81 sites distributed across the United States and will be collected for at least 30 years. The breadth and scope of these data provide a unique resource for advancing biodiversity research. To maximize the potential of this opportunity, however, it is critical that NEON data be maximally accessible and easily integrated into investigators' workflows and analyses. To facilitate its use for biodiversity research and synthesis, we created a workflow to process and format NEON organismal data into the ecocomDP (ecological community data design pattern) format that were available through the ecocomDP R package; we then provided the standardized data as an R data package (neonDivData). We briefly summarize sampling designs and data wrangling decisions for the major taxonomic groups included in this effort. Our workflows are open‐source so the biodiversity community may: add additional taxonomic groups; modify the workflow to produce datasets appropriate for their own analytical needs; and regularly update the data packages as more observations become available. Finally, we provide two simple examples of how the standardized data may be used for biodiversity research. By providing a standardized data package, we hope to enhance the utility of NEON organismal data in advancing biodiversity research and encourage the use of the harmonized ecocomDP data design pattern for community ecology data from other ecological observatory networks. 
    more » « less
  2. Abstract The National Ecological Observatory Network Terrestrial Observation System (NEON TOS) produces open‐access data products that allow data users to investigate the impact of change drivers on key “sentinel” taxa and soils. The spatial and temporal sampling strategy that coordinates implementation of these protocols enables integration across TOS products and with products generated by NEON aquatic, remote sensing, and terrestrial instrument subsystems. Here, we illustrate the plots and sampling units that make up the physical foundation of a NEON TOS site, and we describe the scales (subplot, plot, airshed, and site) at which sampling is spatially colocated across protocols and subsystems. We also describe how moderate resolution imaging spectroradiometer‐enhanced vegetation index (MODIS‐EVI) phenology data are used to temporally coordinate TOS sampling within and across years at the continental scale of the observatory. Individually, TOS protocols produce data products that provide insight into populations, communities, and ecosystem processes. Within the spatial and temporal framework that guides cross‐protocol implementation, the ability to draw inference across data products is enhanced. To illustrate this point, we develop an example using R software that links two TOS data products collected with different temporal frequencies at both plot and site spatial scales. A thorough understanding of how TOS protocols are integrated with each other in space and time, and with other NEON subsystems, is necessary to leverage NEON data products to maximum effect. For example, a researcher must understand the spatial and temporal scales at which soil biogeochemistry data, soil microbe biomass data, and plant litter production and chemistry data may be combined to quantify soil nutrient stocks and fluxes across NEON sites. We present clear links among TOS protocols and across NEON subsystems that will enhance the utility of NEON TOS data products for the data user community. 
    more » « less
  3. null (Ed.)
    Accurately mapping tree species composition and diversity is a critical step towards spatially explicit and species-specific ecological understanding. The National Ecological Observatory Network (NEON) is a valuable source of open ecological data across the United States. Freely available NEON data include in-situ measurements of individual trees, including stem locations, species, and crown diameter, along with the NEON Airborne Observation Platform (AOP) airborne remote sensing imagery, including hyperspectral, multispectral, and light detection and ranging (LiDAR) data products. An important aspect of predicting species using remote sensing data is creating high-quality training sets for optimal classification purposes. Ultimately, manually creating training data is an expensive and time-consuming task that relies on human analyst decisions and may require external data sets or information. We combine in-situ and airborne remote sensing NEON data to evaluate the impact of automated training set preparation and a novel data preprocessing workflow on classifying the four dominant subalpine coniferous tree species at the Niwot Ridge Mountain Research Station forested NEON site in Colorado, USA. We trained pixel-based Random Forest (RF) machine learning models using a series of training data sets along with remote sensing raster data as descriptive features. The highest classification accuracies, 69% and 60% based on internal RF error assessment and an independent validation set, respectively, were obtained using circular tree crown polygons created with half the maximum crown diameter per tree. LiDAR-derived data products were the most important features for species classification, followed by vegetation indices. This work contributes to the open development of well-labeled training data sets for forest composition mapping using openly available NEON data without requiring external data collection, manual delineation steps, or site-specific parameters. 
    more » « less
  4. Abstract Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors. 
    more » « less
  5. Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice. 
    more » « less