skip to main content

Title: A genomic catalog of Earth’s microbiomes
Abstract The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; « less
Award ID(s):
1831952 1826734
Publication Date:
Journal Name:
Nature Biotechnology
Sponsoring Org:
National Science Foundation
More Like this
  1. Jansson, Janet K. (Ed.)
    ABSTRACT Soil ecosystems harbor diverse microorganisms and yet remain only partially characterized as neither single-cell sequencing nor whole-community sequencing offers a complete picture of these complex communities. Thus, the genetic and metabolic potential of this “uncultivated majority” remains underexplored. To address these challenges, we applied a pooled-cell-sorting-based mini-metagenomics approach and compared the results to bulk metagenomics. Informatic binning of these data produced 200 mini-metagenome assembled genomes (sorted-MAGs) and 29 bulk metagenome assembled genomes (MAGs). The sorted and bulk MAGs increased the known phylogenetic diversity of soil taxa by 7.2% with respect to the Joint Genome Institute IMG/M database and showed clade-specific sequence recruitment patterns across diverse terrestrial soil metagenomes. Additionally, sorted-MAGs expanded the rare biosphere not captured through MAGs from bulk sequences, exemplified through phylogenetic and functional analyses of members of the phylum Bacteroidetes . Analysis of 67 Bacteroidetes sorted-MAGs showed conserved patterns of carbon metabolism across four clades. These results indicate that mini-metagenomics enables genome-resolved investigation of predicted metabolism and demonstrates the utility of combining metagenomics methods to tap into the diversity of heterogeneous microbial assemblages. IMPORTANCE Microbial ecologists have historically used cultivation-based approaches as well as amplicon sequencing and shotgun metagenomics to characterize microbial diversity in soil. However,more »challenges persist in the study of microbial diversity, including the recalcitrance of the majority of microorganisms to laboratory cultivation and limited sequence assembly from highly complex samples. The uncultivated majority thus remains a reservoir of untapped genetic diversity. To address some of the challenges associated with bulk metagenomics as well as low throughput of single-cell genomics, we applied flow cytometry-enabled mini-metagenomics to capture expanded microbial diversity from forest soil and compare it to soil bulk metagenomics. Our resulting data from this pooled-cell sorting approach combined with bulk metagenomics revealed increased phylogenetic diversity through novel soil taxa and rare biosphere members. In-depth analysis of genomes within the highly represented Bacteroidetes phylum provided insights into conserved and clade-specific patterns of carbon metabolism.« less
  2. Maximal growth rate is a basic parameter of microbial lifestyle that varies over several orders of magnitude, with doubling times ranging from a matter of minutes to multiple days. Growth rates are typically measured using laboratory culture experiments. Yet, we lack sufficient understanding of the physiology of most microbes to design appropriate culture conditions for them, severely limiting our ability to assess the global diversity of microbial growth rates. Genomic estimators of maximal growth rate provide a practical solution to survey the distribution of microbial growth potential, regardless of cultivation status. We developed an improved maximal growth rate estimator and predicted maximal growth rates from over 200,000 genomes, metagenome-assembled genomes, and single-cell amplified genomes to survey growth potential across the range of prokaryotic diversity; extensions allow estimates from 16S rRNA sequences alone as well as weighted community estimates from metagenomes. We compared the growth rates of cultivated and uncultivated organisms to illustrate how culture collections are strongly biased toward organisms capable of rapid growth. Finally, we found that organisms naturally group into two growth classes and observed a bias in growth predictions for extremely slow-growing organisms. These observations ultimately led us to suggest evolutionary definitions of oligotrophy and copiotrophy basedmore »on the selective regime an organism occupies. We found that these growth classes are associated with distinct selective regimes and genomic functional potentials.« less
  3. Abstract Background Tropical members of the sponge genus Ircinia possess highly complex microbiomes that perform a broad spectrum of chemical processes that influence host fitness. Despite the pervasive role of microbiomes in Ircinia biology, it is still unknown how they remain in stable association across tropical species. To address this question, we performed a comparative analysis of the microbiomes of 11 Ircinia species using whole-metagenomic shotgun sequencing data to investigate three aspects of bacterial symbiont genomes—the redundancy in metabolic pathways across taxa, the evolution of genes involved in pathogenesis, and the nature of selection acting on genes relevant to secondary metabolism. Results A total of 424 new, high-quality bacterial metagenome-assembled genomes (MAGs) were produced for 10 Caribbean Ircinia species, which were evaluated alongside 113 publicly available MAGs sourced from the Pacific species Ircinia ramosa . Evidence of redundancy was discovered in that the core genes of several primary metabolic pathways could be found in the genomes of multiple bacterial taxa. Across hosts, the metagenomes were depleted in genes relevant to pathogenicity and enriched in eukaryotic-like proteins (ELPs) that likely mimic the hosts’ molecular patterning. Finally, clusters of steroid biosynthesis genes (CSGs), which appear to be under purifying selection and undergomore »horizontal gene transfer, were found to be a defining feature of Ircinia metagenomes. Conclusions These results illustrate patterns of genome evolution within highly complex microbiomes that illuminate how associations with hosts are maintained. The metabolic redundancy within the microbiomes could help buffer the hosts from changes in the ambient chemical and physical regimes and from fluctuations in the population sizes of the individual microbial strains that make up the microbiome. Additionally, the enrichment of ELPs and depletion of LPS and cellular motility genes provide a model for how alternative strategies to virulence can evolve in microbiomes undergoing mixed-mode transmission that do not ultimately result in higher levels of damage (i.e., pathogenicity) to the host. Our last set of results provides evidence that sterol biosynthesis in Ircinia -associated bacteria is widespread and that these molecules are important for the survival of bacteria in highly complex Ircinia microbiomes.« less
  4. Abstract Microbial and viral communities transform the chemistry of Earth's ecosystems, yet the specific reactions catalyzed by these biological engines are hard to decode due to the absence of a scalable, metabolically resolved, annotation software. Here, we present DRAM (Distilled and Refined Annotation of Metabolism), a framework to translate the deluge of microbiome-based genomic information into a catalog of microbial traits. To demonstrate the applicability of DRAM across metabolically diverse genomes, we evaluated DRAM performance on a defined, in silico soil community and previously published human gut metagenomes. We show that DRAM accurately assigned microbial contributions to geochemical cycles and automated the partitioning of gut microbial carbohydrate metabolism at substrate levels. DRAM-v, the viral mode of DRAM, established rules to identify virally-encoded auxiliary metabolic genes (AMGs), resulting in the metabolic categorization of thousands of putative AMGs from soils and guts. Together DRAM and DRAM-v provide critical metabolic profiling capabilities that decipher mechanisms underpinning microbiome function.
  5. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.


    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, ormore »single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.


    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at

    « less