skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: High spatial resolution global ocean metagenomes from Bio-GO-SHIP repeat hydrography transects
Abstract Detailed descriptions of microbial communities have lagged far behind physical and chemical measurements in the marine environment. Here, we present 971 globally distributed surface ocean metagenomes collected at high spatio-temporal resolution. Our low-cost metagenomic sequencing protocol produced 3.65 terabases of data, where the median number of base pairs per sample was 3.41 billion. The median distance between sampling stations was 26 km. The metagenomic libraries described here were collected as a part of a biological initiative for the Global Ocean Ship-based Hydrographic Investigations Program, or “Bio-GO-SHIP.” One of the primary aims of GO-SHIP is to produce high spatial and vertical resolution measurements of key state variables to directly quantify climate change impacts on ocean environments. By similarly collecting marine metagenomes at high spatiotemporal resolution, we expect that this dataset will help answer questions about the link between microbial communities and biogeochemical fluxes in a changing ocean.  more » « less
Award ID(s):
1848576 1948842 1559002 1046297
PAR ID:
10360652
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Data
Volume:
8
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Establishing links between microbial diversity and environmental processes requires resolving the high degree of functional variation among closely related lineages or ecotypes. Here, we implement and validate an improved metagenomic approach that estimates the spatial biogeography and environmental regulation of ecotype-specific replication patterns (RObs) across ocean regions. A total of 719 metagenomes were analyzed from meridional Bio-GO-SHIP sections in the Atlantic and Indian Ocean. Accounting for sequencing bias and anchoring replication estimates in genome structure were critical for identifying physiologically relevant biological signals. For example, ecotypes within the dominant marine cyanobacteria Prochlorococcus exhibited distinct diel cycles in RObs that peaked between 19:00–22:00. Additionally, both Prochlorococcus ecotypes and ecotypes within the highly abundant heterotroph Pelagibacter (SAR11) demonstrated systematic biogeographies in RObs that differed from spatial patterns in relative abundance. Finally, RObs was significantly regulated by nutrient stress and temperature, and explained by differences in the genomic potential for nutrient transport, energy production, cell wall structure, and replication. Our results suggest that our new approach to estimating replication is reflective of gross population growth. Moreover, this work reveals that the interaction between adaptation and environmental change drives systematic variability in replication patterns across ocean basins that is ecotype-specific, adding an activity-based dimension to our understanding of microbial niche space. 
    more » « less
  2. Abstract Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types. 
    more » « less
  3. Abstract Historically, our understanding of bacterial ecology in the Indian Ocean has been limited to regional studies that place emphasis on community structure and function within oxygen‐minimum zones. Thus, bacterial community dynamics across the wider Indian Ocean are largely undescribed. As part of Bio‐GO‐SHIP, we sequenced the 16S rRNA gene from 465 samples collected on sections I07N and I09N. We found that (1) there were 23 distinct bioregions within the Indian Ocean, (2) the southeastern gyre had the largest gradient in bacterial alpha‐diversity, (3) the Indian Ocean surface microbiome was primarily composed of a core set of taxa, and (4) bioregions were characterized by transitions in physical and geochemical conditions. Overall, we showed that bacterial community structure spatially delineated the surface Indian Ocean and that these microbially defined regions were reflective of subtle ocean physical and geochemical gradients. Therefore, incorporating metrics of in situ microbial communities into marine ecological regions traditionally defined by remote sensing will improve our ability to delineate warm, oligotrophic regions. 
    more » « less
  4. Abstract The North Temperate Lakes Long-Term Ecological Research (NTL-LTER) program has been extensively used to improve understanding of how aquatic ecosystems respond to environmental stressors, climate fluctuations, and human activities. Here, we report on the metagenomes of samples collected between 2000 and 2019 from Lake Mendota, a freshwater eutrophic lake within the NTL-LTER site. We utilized the distributed metagenome assembler MetaHipMer to coassemble over 10 terabases (Tbp) of data from 471 individual Illumina-sequenced metagenomes. A total of 95,523,664 contigs were assembled and binned to generate 1,894 non-redundant metagenome-assembled genomes (MAGs) with ≥50% completeness and ≤10% contamination. Phylogenomic analysis revealed that the MAGs were nearly exclusively bacterial, dominated by Pseudomonadota (Proteobacteria, N = 623) and Bacteroidota (N = 321). Nine eukaryotic MAGs were identified by eukCC with six assigned to the phylum Chlorophyta. Additionally, 6,350 high-quality viral sequences were identified by geNomad with the majority classified in the phylum Uroviricota. This expansive coassembled metagenomic dataset provides an unprecedented foundation to advance understanding of microbial communities in freshwater ecosystems and explore temporal ecosystem dynamics. 
    more » « less
  5. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. 
    more » « less