skip to main content


Title: From DNA sequences to microbial ecology: Wrangling NEON soil microbe data with the neonMicrobe R package
Abstract

Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors.

 
more » « less
Award ID(s):
1638577 2012878 1926438 2026815
NSF-PAR ID:
10361906
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecosphere
Volume:
12
Issue:
11
ISSN:
2150-8925
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The National Ecological Observatory Network (NEON) is a multidecadal and continental-scale observatory with sites across the United States. Having entered its operational phase in 2018, NEON data products, software, and services become available to facilitate research on the impacts of climate change, land-use change, and invasive species. An essential component of NEON are its 47 tower sites, where eddy-covariance (EC) sensors are operated to determine the surface–atmosphere exchange of momentum, heat, water, and CO 2 . EC tower networks such as AmeriFlux, the Integrated Carbon Observation System (ICOS), and NEON are vital for providing the distributed observations to address interactions at the soil–vegetation–atmosphere interface. NEON represents the largest single-provider EC network globally, with standardized observations and data processing explicitly designed for intersite comparability and analysis of feedbacks across multiple spatial and temporal scales. Furthermore, EC is tightly integrated with soil, meteorology, atmospheric chemistry, isotope, phenology, and rich contextual observations such as airborne remote sensing and in situ sampling bouts. Here, we present an overview of NEON’s observational design, field operation, and data processing that yield community resources for the study of surface–atmosphere interactions. Near-real-time data products become available from the NEON Data Portal, and EC and meteorological data are ingested into AmeriFlux and FLUXNET globally harmonized data releases. Open-source software for reproducible, extensible, and portable data analysis includes the eddy4R family of R packages underlying the EC data product generation. These resources strive to integrate with existing infrastructures and networks, to suggest novel systemic solutions, and to synergize ongoing research efforts across science communities. 
    more » « less
  2. Abstract

    Understanding spatial and temporal variation in plant traits is needed to accurately predict how communities and ecosystems will respond to global change. The National Ecological Observatory Network’s (NEON’s) Airborne Observation Platform (AOP) provides hyperspectral images and associated data products at numerous field sites at 1 m spatial resolution, potentially allowing high‐resolution trait mapping. We tested the accuracy of readily available data products of NEON’s AOP, such as Leaf Area Index (LAI), Total Biomass, Ecosystem Structure (Canopy height model [CHM]), and Canopy Nitrogen, by comparing them to spatially extensive field measurements from a mesic tallgrass prairie. Correlations with AOP data products exhibited generally weak or no relationships with corresponding field measurements. The strongest relationships were between AOP LAI and ground‐measured LAI (r = 0.32) and AOP Total Biomass and ground‐measured biomass (r = 0.23). We also examined how well the full reflectance spectra (380–2,500 nm), as opposed to derived products, could predict vegetation traits using partial least‐squares regression (PLSR) models. Among all the eight traits examined, only Nitrogen had a validation of more than 0.25. For all vegetation traits, validation ranged from 0.08 to 0.29 and the range of the root mean square error of prediction (RMSEP) was 14–64%. Our results suggest that currently available AOP‐derived data products should not be used without extensive ground‐based validation. Relationships using the full reflectance spectra may be more promising, although careful consideration of field and AOP data mismatches in space and/or time, biases in field‐based measurements or AOP algorithms, and model uncertainty are needed. Finally, grassland sites may be especially challenging for airborne spectroscopy because of their high species diversity within a small area, mixed functional types of plant communities, and heterogeneous mosaics of disturbance and resource availability. Remote sensing observations are one of the most promising approaches to understanding ecological patterns across space and time. But the opportunity to engage a diverse community of NEON data users will depend on establishing rigorous links with in‐situ field measurements across a diversity of sites.

     
    more » « less
  3. Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems. 
    more » « less
  4. Elizabeth Borer (Ed.)
    Understanding spatial and temporal variation in plant traits is needed to accurately predict how communities and ecosystems will respond to global change. The National Ecological Observatory Network’s (NEON’s) Airborne Observation Platform (AOP) provides hyperspectral images and associated data products at numerous field sites at 1 m spatial resolution, potentially allowing high-resolution trait mapping. We tested the accuracy of readily available data products of NEON’s AOP, such as Leaf Area Index (LAI), Total Biomass, Ecosystem Structure (Canopy height model [CHM]), and Canopy Nitrogen, by comparing them to spatially extensive field measurements from a mesic tallgrass prairie. Correlations with AOP data products exhibited generally weak or no relationships with corresponding field measurements. The strongest relationships were between AOP LAI and ground-measured LAI (r = 0.32) and AOP Total Biomass and ground-measured biomass (r = 0.23). We also examined how well the full reflectance spectra (380–2,500 nm), as opposed to derived products, could predict vegetation traits using partial least-squares regression (PLSR) models. Among all the eight traits examined, only Nitrogen had a validation of more than 0.25. For all vegetation traits, validation ranged from 0.08 to 0.29 and the range of the root mean square error of prediction (RMSEP) was 14–64%. Our results suggest that currently available AOP-derived data products should not be used without extensive ground-based validation. Relationships using the full reflectance spectra may be more promising, although careful consideration of field and AOP data mismatches in space and/or time, biases in field-based measurements or AOP algorithms, and model uncertainty are needed. Finally, grassland sites may be especially challenging for airborne spectroscopy because of their high species diversity within a small area, mixed functional types of plant communities, and heterogeneous mosaics of disturbance and resource availability. Remote sensing observations are one of the most promising approaches to understanding ecological patterns across space and time. But the opportunity to engage a diverse community of NEON data users will depend on establishing rigorous links with in-situ field measurements across a diversity of sites. 
    more » « less
  5. Abstract

    The National Ecological Observatory Network Terrestrial Observation System (NEON TOS) produces open‐access data products that allow data users to investigate the impact of change drivers on key “sentinel” taxa and soils. The spatial and temporal sampling strategy that coordinates implementation of these protocols enables integration across TOS products and with products generated by NEON aquatic, remote sensing, and terrestrial instrument subsystems. Here, we illustrate the plots and sampling units that make up the physical foundation of a NEON TOS site, and we describe the scales (subplot, plot, airshed, and site) at which sampling is spatially colocated across protocols and subsystems. We also describe how moderate resolution imaging spectroradiometer‐enhanced vegetation index (MODIS‐EVI) phenology data are used to temporally coordinate TOS sampling within and across years at the continental scale of the observatory. Individually, TOS protocols produce data products that provide insight into populations, communities, and ecosystem processes. Within the spatial and temporal framework that guides cross‐protocol implementation, the ability to draw inference across data products is enhanced. To illustrate this point, we develop an example using R software that links two TOS data products collected with different temporal frequencies at both plot and site spatial scales. A thorough understanding of how TOS protocols are integrated with each other in space and time, and with other NEON subsystems, is necessary to leverage NEON data products to maximum effect. For example, a researcher must understand the spatial and temporal scales at which soil biogeochemistry data, soil microbe biomass data, and plant litter production and chemistry data may be combined to quantify soil nutrient stocks and fluxes across NEON sites. We present clear links among TOS protocols and across NEON subsystems that will enhance the utility of NEON TOS data products for the data user community.

     
    more » « less