skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The National Ecological Observatory Network’s soil metagenomes: assembly and basic analysis
The largest dataset of soil metagenomes has recently been released by the National Ecological Observatory Network (NEON), which performs annual shotgun sequencing of soils at 47 sites across the United States. NEON serves as a valuable educational resource, thanks to its open data and programming tutorials, but there is currently no introductory tutorial for accessing and analyzing the soil shotgun metagenomic dataset. Here, we describe methods for processing raw soil metagenome sequencing reads using a bioinformatics pipeline tailored to the high complexity and diversity of the soil microbiome. We describe the rationale, necessary resources, and implementation of steps such as cleaning raw reads, taxonomic classification, assembly into contigs or genomes, annotation of predicted genes using custom protein databases, and exporting data for downstream analysis. The workflow presented here aims to increase the accessibility of NEON’s shotgun metagenome data, which can provide important clues about soil microbial communities and their ecological roles.  more » « less
Award ID(s):
1638577
PAR ID:
10502082
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
F1000research
Date Published:
Journal Name:
F1000Research
Volume:
10
ISSN:
2046-1402
Page Range / eLocation ID:
299
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Microorganisms are ubiquitous in the biosphere, playing a crucial role in both biogeochemistry of the planet and human health. However, identifying these microorganisms and defining their function are challenging. Widely used approaches in comparative metagenomics, 16S amplicon sequencing and whole genome shotgun sequencing (WGS), have provided access to DNA sequencing analysis to identify microorganisms and evaluate diversity and abundance in various environments. However, advances in parallel high-throughput DNA sequencing in the past decade have introduced major hurdles, namely standardization of methods, data storage, reproducible interoperability of results, and data sharing. The National Ecological Observatory Network (NEON), established by the National Science Foundation, enables all researchers to address queries on a regional to continental scale around a variety of environmental challenges and provide high-quality, integrated, and standardized data from field sites across the U.S. As the amount of metagenomic data continues to grow, standardized procedures that allow results across projects to be assessed and compared is becoming increasingly important in the field of metagenomics. We demonstrate the feasibility of using publicly available NEON soil metagenomic sequencing datasets in combination with open access Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) server to illustrate advantages of WGS compared to 16S amplicon sequencing. Four WGS and four 16S amplicon sequence datasets, from surface soil samples prepared by NEON investigators, were selected for comparison, using standardized protocols collected at the same locations in Colorado between April-July 2014. The dominant bacterial phyla detected across samples agreed between sequencing methodologies. However, WGS yielded greater microbial resolution, increased accuracy, and allowed identification of more genera of bacteria, archaea, viruses, and eukaryota, and putative functional genes that would have gone undetected using 16S amplicon sequencing. NEON open data will be useful for future studies characterizing and quantifying complex ecological processes associated with changing aquatic and terrestrial ecosystems. 
    more » « less
  2. Using sequence reads from shotgun metagenomic analyses in both cattle and sheep, we describe how failures in mate pairing on Illumina sequencing can interact with bioinformatics pipelines to give spurious patterns among rare components of a metagenomic sample. We identified several different shotgun metagenomic datasets from different animals and different laboratories where the two members of the read pair matched a viral database at very different frequencies. We traced this bias to a set of poly-G reads of high quality that resulted from failures in generating read pairs during library preparation. These results reinforce the need to remove poly-G-rich reads when quality filtering shotgun metagenomic data. 
    more » « less
  3. Abstract Soil microbial communities play critical roles in various ecosystem processes, but studies at a large spatial and temporal scale have been challenging due to the difficulty in finding the relevant samples in available data sets as well as the lack of standardization in sample collection and processing. The National Ecological Observatory Network (NEON) has been collecting soil microbial community data multiple times per year for 47 terrestrial sites in 20 eco‐climatic domains, producing one of the most extensive standardized sampling efforts for soil microbial biodiversity to date. Here, we introduce the neonMicrobe R package—a suite of downloading, preprocessing, data set assembly, and sensitivity analysis tools for NEON’s newly published 16S and ITS amplicon sequencing data products which characterize soil bacterial and fungal communities, respectively. neonMicrobe is designed to make these data more accessible to ecologists without assuming prior experience with bioinformatic pipelines. We describe quality control steps used to remove quality‐flagged samples, report on sensitivity analyses used to determine appropriate quality filtering parameters for the DADA2 workflow, and demonstrate the immediate usability of the output data by conducting standard analyses of soil microbial diversity. The sequence abundance tables produced byneonMicrobecan be linked to NEON’s other data products (e.g., soil physical and chemical properties, plant community composition) and soil subsamples archived in the NEON Biorepository. We provide recommendations for incorporatingneonMicrobeinto reproducible scientific workflows, discuss technical considerations for large‐scale amplicon sequence analysis, and outline future directions for NEON‐enabled microbial ecology. In particular, we believe that NEON marker gene sequence data will allow researchers to answer outstanding questions about the spatial and temporal dynamics of soil microbial communities while explicitly accounting for scale dependence. We expect that the data produced by NEON and theneonMicrobeR package will act as a valuable ecological baseline to inform and contextualize future experimental and modeling endeavors. 
    more » « less
  4. Abstract. Air–water gas exchange is essential to understanding and quantifying many biogeochemical processes in streams and rivers, including greenhouse gas emissions and metabolism. Gas exchange depends on two factors, which are often quantified separately: (1) the air–water concentration gradient of the gas and (2) the gas exchange velocity.  There are fewer measurements of gas exchange velocity compared to concentrations in streams and rivers, which limits accurate characterization of air–water gas exchange (i.e., flux rates). The National Ecological Observatory Network (NEON) conducts SF6 gas-loss experiments in 22 of their 24 wadeable streams using standardized methods across all experiments and sites, and publishes raw concentration data from these experiments on the NEON data portal. NEON also conducts NaCl injections that can be used to characterize hydraulic geometry at all 24 wadeable streams. These NaCl injections are conducted both as part of the gas-loss experiments and separately. Here, we use these data to estimate gas exchange and water velocity using the reaRate R package. The dataset presented includes estimates of hydraulic parameters, cleaned raw concentration SF6 tracer-gas data (including removing outliers and failed experiments), estimated SF6 gas-loss rates, normalized gas exchange velocities (k600; m d−1) and normalized depth-dependent gas exchange rates (K600; d−1). This dataset provides one of the largest compilations of gas-loss experiments (n=339) in streams to date. This dataset is unique in that it contains gas exchange estimates from repeated experiments in geographically diverse streams across a range of discharges. In addition, this dataset contains information on the hydraulic geometry of all 24 NEON wadeable streams, which will support future research using NEON aquatic data. This dataset is a valuable resource that can be used to explore both within- and across-reach variability in the hydraulic geometry and gas exchange velocity in streams. The data are available at https://doi.org/10.6073/pasta/18dcc1871ee71cf0b69f2ee4082839d0 (Aho et al., 2024), and the reaRate R package code is available at https://doi.org/10.5281/zenodo.12786089 (Cawley et al., 2024). 
    more » « less
  5. Abstract The National Ecological Observatory Network Terrestrial Observation System (NEON TOS) produces open‐access data products that allow data users to investigate the impact of change drivers on key “sentinel” taxa and soils. The spatial and temporal sampling strategy that coordinates implementation of these protocols enables integration across TOS products and with products generated by NEON aquatic, remote sensing, and terrestrial instrument subsystems. Here, we illustrate the plots and sampling units that make up the physical foundation of a NEON TOS site, and we describe the scales (subplot, plot, airshed, and site) at which sampling is spatially colocated across protocols and subsystems. We also describe how moderate resolution imaging spectroradiometer‐enhanced vegetation index (MODIS‐EVI) phenology data are used to temporally coordinate TOS sampling within and across years at the continental scale of the observatory. Individually, TOS protocols produce data products that provide insight into populations, communities, and ecosystem processes. Within the spatial and temporal framework that guides cross‐protocol implementation, the ability to draw inference across data products is enhanced. To illustrate this point, we develop an example using R software that links two TOS data products collected with different temporal frequencies at both plot and site spatial scales. A thorough understanding of how TOS protocols are integrated with each other in space and time, and with other NEON subsystems, is necessary to leverage NEON data products to maximum effect. For example, a researcher must understand the spatial and temporal scales at which soil biogeochemistry data, soil microbe biomass data, and plant litter production and chemistry data may be combined to quantify soil nutrient stocks and fluxes across NEON sites. We present clear links among TOS protocols and across NEON subsystems that will enhance the utility of NEON TOS data products for the data user community. 
    more » « less