skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, April 12 until 2:00 AM ET on Saturday, April 13 due to maintenance. We apologize for the inconvenience.

Title: METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks
Abstract Background

Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.


We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, or single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.


METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Microbial communities are essential components of aquatic ecosystems through their contribution to food web dynamics and biogeochemical processes. Aquatic microbial diversity is immense and a general challenge is to understand how metabolism and interactions of single organisms shape microbial community dynamics and ecosystem‐scale biogeochemical transformations. Metagenomic approaches have developed rapidly, and proven to be powerful in linking microbial community dynamics to biogeochemical processes. In this review, we provide an overview of metagenomic approaches, followed by a discussion on some recent insights they have provided, including those in this special issue. These include the discovery of new taxa and metabolisms in aquatic microbiomes, insights into community assembly and functional ecology as well as evolutionary processes shaping microbial genomes and microbiomes, and the influence of human activities on aquatic microbiomes. Given that metagenomics can now be considered a mature technology where data generation and descriptive analyses are relatively routine and informative, we then discuss metagenomic‐enabled research avenues to further link microbial dynamics to biogeochemical processes. These include the integration of metagenomics into well‐designed ecological experiments, the use of metagenomics to inform and validate metabolic and biogeochemical models, and the pressing need for ecologically relevant model organisms and simple microbial systems to better interpret the taxonomic and functional information integrated in metagenomes. These research avenues will contribute to a more mechanistic and predictive understanding of links between microbial dynamics and biogeochemical cycles. Owing to rapid climate change and human impacts on aquatic ecosystems, the urgency of such an understanding has never been greater.

    more » « less
  2. Abstract Background

    Microbiomes are now recognized as the main drivers of ecosystem function ranging from the oceans and soils to humans and bioreactors. However, a grand challenge in microbiome science is to characterize and quantify the chemical currencies of organic matter (i.e., metabolites) that microbes respond to and alter. Critical to this has been the development of Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), which has drastically increased molecular characterization of complex organic matter samples, but challenges users with hundreds of millions of data points where readily available, user-friendly, and customizable software tools are lacking.


    Here, we build on years of analytical experience with diverse sample types to develop MetaboDirect, an open-source, command-line-based pipeline for the analysis (e.g., chemodiversity analysis, multivariate statistics), visualization (e.g., Van Krevelen diagrams, elemental and molecular class composition plots), and presentation of direct injection high-resolution FT-ICR MS data sets after molecular formula assignment has been performed. When compared to other available FT-ICR MS software, MetaboDirect is superior in that it requires a single line of code to launch a fully automated framework for the generation and visualization of a wide range of plots, with minimal coding experience required. Among the tools evaluated, MetaboDirect is also uniquely able to automatically generate biochemical transformation networks (ab initio) based on mass differences (mass difference network-based approach) that provide an experimental assessment of metabolite connections within a given sample or a complex metabolic system, thereby providing important information about the nature of the samples and the set of microbial reactions or pathways that gave rise to them. Finally, for more experienced users, MetaboDirect allows users to customize plots, outputs, and analyses.


    Application of MetaboDirect to FT-ICR MS-based metabolomic data sets from a marine phage-bacterial infection experiment and aSphagnumleachate microbiome incubation experiment showcase the exploration capabilities of the pipeline that will enable the research community to evaluate and interpret their data in greater depth and in less time. It will further advance our knowledge of how microbial communities influence and are influenced by the chemical makeup of the surrounding system. The source code and User’s guide of MetaboDirect are freely available through ( and (, respectively.

    more » « less
  3. Giovannoni, Stephen J. (Ed.)
    ABSTRACT <p>Archaea belonging to the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) superphylum have been found in an expanding number of environments and perform a variety of biogeochemical roles, including contributing to carbon, sulfur, and nitrogen cycling. Generally characterized by ultrasmall cell sizes and reduced genomes, DPANN archaea may form mutualistic, commensal, or parasitic interactions with various archaeal and bacterial hosts, influencing the ecology and functioning of microbial communities. While DPANN archaea reportedly comprise a sizeable fraction of the archaeal community within marine oxygen-deficient zone (ODZ) water columns, little is known about their metabolic capabilities in these ecosystems. We report 33 novel metagenome-assembled genomes (MAGs) belonging to the DPANN phyla Nanoarchaeota, Pacearchaeota, Woesearchaeota, Undinarchaeota, Iainarchaeota, and SpSt-1190 from pelagic ODZs in the Eastern Tropical North Pacific and the Arabian Sea. We find these archaea to be permanent, stable residents of all three major ODZs only within anoxic depths, comprising up to 1% of the total microbial community and up to 25%–50% of archaea as estimated from read mapping to MAGs. ODZ DPANN appear to be capable of diverse metabolic functions, including fermentation, organic carbon scavenging, and the cycling of sulfur, hydrogen, and methane. Within a majority of ODZ DPANN, we identify a gene homologous to nitrous oxide reductase. Modeling analyses indicate the feasibility of a nitrous oxide reduction metabolism for host-attached symbionts, and the small genome sizes and reduced metabolic capabilities of most DPANN MAGs suggest host-associated lifestyles within ODZs.</p></sec> <sec><title>IMPORTANCE

    Archaea from the DPANN (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota) superphylum have diverse metabolic capabilities and participate in multiple biogeochemical cycles. While metagenomics and enrichments have revealed that many DPANN are characterized by ultrasmall genomes, few biosynthetic genes, and episymbiotic lifestyles, much remains unknown about their biology. We report 33 new DPANN metagenome-assembled genomes originating from the three global marine oxygen-deficient zones (ODZs), the first from these regions. We survey DPANN abundance and distribution within the ODZ water column, investigate their biosynthetic capabilities, and report potential roles in the cycling of organic carbon, methane, and nitrogen. We test the hypothesis that nitrous oxide reductases found within several ODZ DPANN genomes may enable ultrasmall episymbionts to serve as nitrous oxide consumers when attached to a host nitrous oxide producer. Our results indicate DPANN archaea as ubiquitous residents within the anoxic core of ODZs with the potential to produce or consume key compounds.

    more » « less
  4. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binning approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities. 
    more » « less
  5. Abstract Background

    Stable isotope probing (SIP) approaches are a critical tool in microbiome research to determine associations between species and substrates, as well as the activity of species. The application of these approaches ranges from studying microbial communities important for global biogeochemical cycling to host-microbiota interactions in the intestinal tract. Current SIP approaches, such as DNA-SIP or nanoSIMS allow to analyze incorporation of stable isotopes with high coverage of taxa in a community and at the single cell level, respectively, however they are limited in terms of sensitivity, resolution or throughput.


    Here, we present an ultra-sensitive, high-throughput protein-based stable isotope probing approach (Protein-SIP), which cuts cost for labeled substrates by 50–99% as compared to other SIP and Protein-SIP approaches and thus enables isotope labeling experiments on much larger scales and with higher replication. The approach allows for the determination of isotope incorporation into microbiome members with species level resolution using standard metaproteomics liquid chromatography-tandem mass spectrometry (LC–MS/MS) measurements. At the core of the approach are new algorithms to analyze the data, which have been implemented in an open-source software ( We demonstrate sensitivity, precision and accuracy using bacterial cultures and mock communities with different labeling schemes. Furthermore, we benchmark our approach against two existing Protein-SIP approaches and show that in the low labeling range used our approach is the most sensitive and accurate. Finally, we measure translational activity using18O heavy water labeling in a 63-species community derived from human fecal samples grown on media simulating two different diets. Activity could be quantified on average for 27 species per sample, with 9 species showing significantly higher activity on a high protein diet, as compared to a high fiber diet. Surprisingly, among the species with increased activity on high protein were severalBacteroidesspecies known as fiber consumers. Apparently, protein supply is a critical consideration when assessing growth of intestinal microbes on fiber, including fiber-based prebiotics.


    We demonstrate that our Protein-SIP approach allows for the ultra-sensitive (0.01 to 10% label) detection of stable isotopes of elements found in proteins, using standard metaproteomics data.

    more » « less