skip to main content

Title: Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly
Abstract Background

Diverse communities of microbial eukaryotes in the global ocean provide a variety of essential ecosystem services, from primary production and carbon flow through trophic transfer to cooperation via symbioses. Increasingly, these communities are being understood through the lens of omics tools, which enable high-throughput processing of diverse communities. Metatranscriptomics offers an understanding of near real-time gene expression in microbial eukaryotic communities, providing a window into community metabolic activity.


Here we present a workflow for eukaryotic metatranscriptome assembly, and validate the ability of the pipeline to recapitulate real and manufactured eukaryotic community-level expression data. We also include an open-source tool for simulating environmental metatranscriptomes for testing and validation purposes. We reanalyze previously published metatranscriptomic datasets using our metatranscriptome analysis approach.


We determined that a multi-assembler approach improves eukaryotic metatranscriptome assembly based on recapitulated taxonomic and functional annotations from an in-silico mock community. The systematic validation of metatranscriptome assembly and annotation methods provided here is a necessary step to assess the fidelity of our community composition measurements and functional content assignments from eukaryotic metatranscriptomes.

; ; ;
Publication Date:
Journal Name:
BMC Bioinformatics
Springer Science + Business Media
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.


    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, ormore »single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.


    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at

    « less
  2. Abstract Background

    Microbiomes are now recognized as the main drivers of ecosystem function ranging from the oceans and soils to humans and bioreactors. However, a grand challenge in microbiome science is to characterize and quantify the chemical currencies of organic matter (i.e., metabolites) that microbes respond to and alter. Critical to this has been the development of Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), which has drastically increased molecular characterization of complex organic matter samples, but challenges users with hundreds of millions of data points where readily available, user-friendly, and customizable software tools are lacking.


    Here, we build on years of analytical experience with diverse sample types to develop MetaboDirect, an open-source, command-line-based pipeline for the analysis (e.g., chemodiversity analysis, multivariate statistics), visualization (e.g., Van Krevelen diagrams, elemental and molecular class composition plots), and presentation of direct injection high-resolution FT-ICR MS data sets after molecular formula assignment has been performed. When compared to other available FT-ICR MS software, MetaboDirect is superior in that it requires a single line of code to launch a fully automated framework for the generation and visualization of a wide range of plots, with minimal coding experience required. Among the tools evaluated, MetaboDirect is alsomore »uniquely able to automatically generate biochemical transformation networks (ab initio) based on mass differences (mass difference network-based approach) that provide an experimental assessment of metabolite connections within a given sample or a complex metabolic system, thereby providing important information about the nature of the samples and the set of microbial reactions or pathways that gave rise to them. Finally, for more experienced users, MetaboDirect allows users to customize plots, outputs, and analyses.


    Application of MetaboDirect to FT-ICR MS-based metabolomic data sets from a marine phage-bacterial infection experiment and aSphagnumleachate microbiome incubation experiment showcase the exploration capabilities of the pipeline that will enable the research community to evaluate and interpret their data in greater depth and in less time. It will further advance our knowledge of how microbial communities influence and are influenced by the chemical makeup of the surrounding system. The source code and User’s guide of MetaboDirect are freely available through ( and (, respectively.

    « less
  3. Abstract

    The trace metal iron (Fe) controls the diversity and activity of phytoplankton across the surface oceans, a paradigm established through decades of in situ and mesocosm experimental studies. Despite widespread Fe-limitation within high-nutrient, low chlorophyll (HNLC) waters, significant contributions of the cyanobacteriumSynechococcusto the phytoplankton stock can be found. Correlations among differing strains ofSynechococcusacross different Fe-regimes have suggested the existence of Fe-adapted ecotypes. However, experimental evidence of high- versuslow-Fe adapted strains ofSynechococcusis lacking, and so we investigated the transcriptional responses of microbial communities inhabiting the HNLC, sub-Antarctic region of the Southern Ocean during the Spring of 2018. Analysis of metatranscriptomes generated from on-deck incubation experiments reflecting a gradient of Fe-availabilities reveal transcriptomic signatures indicative of co-occurringSynechococcusecotypes adapted to differing Fe-regimes. Functional analyses comparing low-Fe and high-Fe conditions point to various Fe-acquisition mechanisms that may allow persistence of low-Fe adaptedSynechococcusunder Fe-limitation. Comparison of in situ surface conditions to the Fe-titrations indicate ecological relevance of these mechanisms as well as persistence of both putative ecotypes within this region. This Fe-titration approach, combined with transcriptomics, highlights the short-term responses of the in situ phytoplankton community to Fe-availability that are often overlooked by examining genomic content or bulk physiological responses alone. These findings expandmore »our knowledge about how phytoplankton in HNLC Southern Ocean waters adapt and respond to changing Fe supply.

    « less
  4. Abstract

    Corals and sponges harbor diverse microbial communities that are integral to the functioning of the host. While the taxonomic diversity of their microbiomes has been well-established for corals and sponges, their functional roles are less well-understood. It is unclear if the similarities of symbiosis in an invertebrate host would result in functionally similar microbiomes, or if differences in host phylogeny and environmentally driven microhabitats within each host would shape functionally distinct communities. Here we addressed this question, using metatranscriptomic and 16S rRNA gene profiling techniques to compare the microbiomes of two host organisms from different phyla. Our results indicate functional similarity in carbon, nitrogen, and sulfur assimilation, and aerobic nitrogen cycling. Additionally, there were few statistical differences in pathway coverage or abundance between the two hosts. For example, we observed higher coverage of phosphonate and siderophore metabolic pathways in the star coral,Montastraea cavernosa, while there was higher coverage of chloroalkane metabolism in the giant barrel sponge,Xestospongia muta. Higher abundance of genes associated with carbon fixation pathways was also observed inM. cavernosa, while inX. mutathere was higher abundance of fatty acid metabolic pathways. Metagenomic predictions based on 16S rRNA gene profiling analysis were similar, and there was high correlation betweenmore »the metatranscriptome and metagenome predictions for both hosts. Our results highlight several metabolic pathways that exhibit functional similarity in these coral and sponge microbiomes despite the taxonomic differences between the two microbiomes, as well as potential specialization of some microbially based metabolism within each host.

    « less
  5. Jansson, Janet K. (Ed.)
    ABSTRACT Soil ecosystems harbor diverse microorganisms and yet remain only partially characterized as neither single-cell sequencing nor whole-community sequencing offers a complete picture of these complex communities. Thus, the genetic and metabolic potential of this “uncultivated majority” remains underexplored. To address these challenges, we applied a pooled-cell-sorting-based mini-metagenomics approach and compared the results to bulk metagenomics. Informatic binning of these data produced 200 mini-metagenome assembled genomes (sorted-MAGs) and 29 bulk metagenome assembled genomes (MAGs). The sorted and bulk MAGs increased the known phylogenetic diversity of soil taxa by 7.2% with respect to the Joint Genome Institute IMG/M database and showed clade-specific sequence recruitment patterns across diverse terrestrial soil metagenomes. Additionally, sorted-MAGs expanded the rare biosphere not captured through MAGs from bulk sequences, exemplified through phylogenetic and functional analyses of members of the phylum Bacteroidetes . Analysis of 67 Bacteroidetes sorted-MAGs showed conserved patterns of carbon metabolism across four clades. These results indicate that mini-metagenomics enables genome-resolved investigation of predicted metabolism and demonstrates the utility of combining metagenomics methods to tap into the diversity of heterogeneous microbial assemblages. IMPORTANCE Microbial ecologists have historically used cultivation-based approaches as well as amplicon sequencing and shotgun metagenomics to characterize microbial diversity in soil. However,more »challenges persist in the study of microbial diversity, including the recalcitrance of the majority of microorganisms to laboratory cultivation and limited sequence assembly from highly complex samples. The uncultivated majority thus remains a reservoir of untapped genetic diversity. To address some of the challenges associated with bulk metagenomics as well as low throughput of single-cell genomics, we applied flow cytometry-enabled mini-metagenomics to capture expanded microbial diversity from forest soil and compare it to soil bulk metagenomics. Our resulting data from this pooled-cell sorting approach combined with bulk metagenomics revealed increased phylogenetic diversity through novel soil taxa and rare biosphere members. In-depth analysis of genomes within the highly represented Bacteroidetes phylum provided insights into conserved and clade-specific patterns of carbon metabolism.« less