skip to main content


Title: Iterative subtractive binning of freshwater chronoseries metagenomes identifies over 400 novel species and their ecologic preferences
Summary

Recent advances in sequencing technology and bioinformatic pipelines have allowed unprecedented access to the genomes of yet‐uncultivated microorganisms from diverse environments. However, the catalogue of freshwater genomes remains limited, and most genome recovery attempts in freshwater ecosystems have only targeted specific taxa. Here, we present a genome recovery pipeline incorporating iterative subtractive binning, and apply it to a time series of 100 metagenomic datasets from seven connected lakes and estuaries along the Chattahoochee River (Southeastern USA). Our set of metagenome‐assembled genomes (MAGs) represents >400 yet‐unnamed genomospecies, substantially increasing the number of high‐quality MAGs from freshwater lakes. We propose names for two novel species: ‘CandidatusElulimicrobium humile’ (‘Ca. Elulimicrobiota’, ‘Patescibacteria’) and ‘CandidatusAquidulcis frankliniae’ (‘Chloroflexi’). Collectively, our MAGs represented about half of the total microbial community at any sampling point. To evaluate the prevalence of these genomospecies in the chronoseries, we introduce methodologies to estimate relative abundance and habitat preference that control for uneven genome quality and sample representation. We demonstrate high degrees of habitat‐specialization and endemicity for most genomospecies in the Chattahoochee lakes. Wider ecological ranges characterized smaller genomes with higher coding densities, indicating an overall advantage of smaller, more compact genomes for cosmopolitan distributions.

 
more » « less
Award ID(s):
1759831 1241046
NSF-PAR ID:
10455183
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Environmental Microbiology
Volume:
22
Issue:
8
ISSN:
1462-2912
Page Range / eLocation ID:
p. 3394-3412
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Advances in sequencing technologies and bioinformatics tools have dramatically increased the recovery rate of microbial genomes from metagenomic data. Assessing the quality of metagenome-assembled genomes (MAGs) is a critical step before downstream analysis. Here, we present CheckM2, an improved method of predicting genome quality of MAGs using machine learning. Using synthetic and experimental data, we demonstrate that CheckM2 outperforms existing tools in both accuracy and computational speed. In addition, CheckM2’s database can be rapidly updated with new high-quality reference genomes, including taxa represented only by a single genome. We also show that CheckM2 accurately predicts genome quality for MAGs from novel lineages, even for those with reduced genome size (for example, Patescibacteria and the DPANN superphylum). CheckM2 provides accurate genome quality predictions across bacterial and archaeal lineages, giving increased confidence when inferring biological conclusions from MAGs. 
    more » « less
  2. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. 
    more » « less
  3. McBain, Andrew J. (Ed.)
    ABSTRACT The recovery of metagenome-assembled genomes (MAGs) from metagenomic data has recently become a common task for microbial studies. The strengths and limitations of the underlying bioinformatics algorithms are well appreciated by now based on performance tests with mock data sets of known composition. However, these mock data sets do not capture the complexity and diversity often observed within natural populations, since their construction typically relies on only a single genome of a given organism. Further, it remains unclear if MAGs can recover population-variable genes (those shared by >10% but <90% of the members of the population) as efficiently as core genes (those shared by >90% of the members). To address these issues, we compared the gene variabilities of pathogenic Escherichia coli isolates from eight diarrheal samples, for which the isolate was the causative agent, against their corresponding MAGs recovered from the companion metagenomic data set. Our analysis revealed that MAGs with completeness estimates near 95% captured only 77% of the population core genes and 50% of the variable genes, on average. Further, about 5% of the genes of these MAGs were conservatively identified as missing in the isolate and were of different (non- Enterobacteriaceae ) taxonomic origin, suggesting errors at the genome-binning step, even though contamination estimates based on commonly used pipelines were only 1.5%. Therefore, the quality of MAGs may often be worse than estimated, and we offer examples of how to recognize and improve such MAGs to sufficient quality by (for instance) employing only contigs longer than 1,000 bp for binning. IMPORTANCE Metagenome assembly and the recovery of metagenome-assembled genomes (MAGs) have recently become common tasks for microbiome studies across environmental and clinical settings. However, the extent to which MAGs can capture the genes of the population they represent remains speculative. Current approaches to evaluating MAG quality are limited to the recovery and copy number of universal housekeeping genes, which represent a small fraction of the total genome, leaving the majority of the genome essentially inaccessible. If MAG quality in reality is lower than these approaches would estimate, this could have dramatic consequences for all downstream analyses and interpretations. In this study, we evaluated this issue using an approach that employed comparisons of the gene contents of MAGs to the gene contents of isolate genomes derived from the same sample. Further, our samples originated from a diarrhea case-control study, and thus, our results are relevant for recovering the virulence factors of pathogens from metagenomic data sets. 
    more » « less
  4. Abstract

    Lake Tanganyika (LT) is the largest tropical freshwater lake, and the largest body of anoxic freshwater on Earth’s surface. LT’s mixed oxygenated surface waters float atop a permanently anoxic layer and host rich animal biodiversity. However, little is known about microorganisms inhabiting LT’s 1470 meter deep water column and their contributions to nutrient cycling, which affect ecosystem-level function and productivity. Here, we applied genome-resolved metagenomics and environmental analyses to link specific taxa to key biogeochemical processes across a vertical depth gradient in LT. We reconstructed 523 unique metagenome-assembled genomes (MAGs) from 34 bacterial and archaeal phyla, including many rarely observed in freshwater lakes. We identified sharp contrasts in community composition and metabolic potential with an abundance of typical freshwater taxa in oxygenated mixed upper layers, and Archaea and uncultured Candidate Phyla in deep anoxic waters. Genomic capacity for nitrogen and sulfur cycling was abundant in MAGs recovered from anoxic waters, highlighting microbial contributions to the productive surface layers via recycling of upwelled nutrients, and greenhouse gases such as nitrous oxide. Overall, our study provides a blueprint for incorporation of aquatic microbial genomics in the representation of tropical freshwater lakes, especially in the context of ongoing climate change, which is predicted to bring increased stratification and anoxia to freshwater lakes.

     
    more » « less
  5. Giovannoni, Stephen J. (Ed.)
    ABSTRACT Microbial nitrification is a critical process governing nitrogen availability in aquatic systems. Freshwater nitrifiers have received little attention, leaving many unanswered questions about their taxonomic distribution, functional potential, and ecological interactions. Here, we reconstructed genomes to infer the metabolism and ecology of free-living picoplanktonic nitrifiers across the Laurentian Great Lakes, a connected series of five of Earth’s largest lakes. Surprisingly, ammonia-oxidizing bacteria (AOB) related to Nitrosospira dominated over ammonia-oxidizing archaea (AOA) at nearly all stations, with distinct ecotypes prevailing in the transparent, oligotrophic upper lakes compared to Lakes Erie and Ontario. Unexpectedly, one ecotype of Nitrosospira encodes proteorhodopsin, which could enhance survival under conditions where ammonia oxidation is inhibited or substrate limited. Nitrite-oxidizing bacteria (NOB) “ Candidatus Nitrotoga” and Nitrospira fluctuated in dominance, with the latter prevailing in deeper, less-productive basins. Genome reconstructions reveal highly reduced genomes and features consistent with genome streamlining, along with diverse adaptations to sunlight and oxidative stress and widespread capacity for organic nitrogen use. Our findings expand the known functional diversity of nitrifiers and establish their ecological genomics in large lake ecosystems. By elucidating links between microbial biodiversity and biogeochemical cycling, our work also informs ecosystem models of the Laurentian Great Lakes, a critical freshwater resource experiencing rapid environmental change. IMPORTANCE Microorganisms play critical roles in Earth’s nitrogen cycle. In lakes, microorganisms called nitrifiers derive energy from reduced nitrogen compounds. In doing so, they transform nitrogen into a form that can ultimately be lost to the atmosphere by a process called denitrification, which helps mitigate nitrogen pollution from fertilizer runoff and sewage. Despite their importance, freshwater nitrifiers are virtually unexplored. To understand their diversity and function, we reconstructed genomes of freshwater nitrifiers across some of Earth’s largest freshwater lakes, the Laurentian Great Lakes. We discovered several new species of nitrifiers specialized for clear low-nutrient waters and distinct species in comparatively turbid Lake Erie. Surprisingly, one species may be able to harness light energy by using a protein called proteorhodopsin, despite the fact that nitrifiers typically live in deep dark water. Our work reveals the unique biodiversity of the Great Lakes and fills key gaps in our knowledge of an important microbial group, the nitrifiers. 
    more » « less