skip to main content


Title: Reverse engineering environmental metatranscriptomes clarifies best practices for eukaryotic assembly
Abstract Background

Diverse communities of microbial eukaryotes in the global ocean provide a variety of essential ecosystem services, from primary production and carbon flow through trophic transfer to cooperation via symbioses. Increasingly, these communities are being understood through the lens of omics tools, which enable high-throughput processing of diverse communities. Metatranscriptomics offers an understanding of near real-time gene expression in microbial eukaryotic communities, providing a window into community metabolic activity.

Results

Here we present a workflow for eukaryotic metatranscriptome assembly, and validate the ability of the pipeline to recapitulate real and manufactured eukaryotic community-level expression data. We also include an open-source tool for simulating environmental metatranscriptomes for testing and validation purposes. We reanalyze previously published metatranscriptomic datasets using our metatranscriptome analysis approach.

Conclusion

We determined that a multi-assembler approach improves eukaryotic metatranscriptome assembly based on recapitulated taxonomic and functional annotations from an in-silico mock community. The systematic validation of metatranscriptome assembly and annotation methods provided here is a necessary step to assess the fidelity of our community composition measurements and functional content assignments from eukaryotic metatranscriptomes.

 
more » « less
NSF-PAR ID:
10400040
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
24
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Metatranscriptomics is a powerful method for studying the composition and function of complex microbial communities. The application of metatranscriptomics to multispecies parasite infections is of particular interest, as research on parasite evolution and diversification has been hampered by technical challenges to genome‐scale DNA sequencing. In particular, blood parasites of vertebrates are abundant and diverse although they often occur at low infection intensities and exist as multispecies infections, rendering the isolation of genomic sequence data challenging. Here, we use birds and their diverse haemosporidian parasites to illustrate the potential for metatranscriptome sequencing to generate large quantities of genome‐wide sequence data from multiple blood parasite species simultaneously. We used RNA‐sequencing of 24 blood samples from songbirds in North America to show that metatranscriptomes can yield large proportions of haemosporidian protein‐coding gene repertoires even when infections are of low intensity (<0.1% red blood cells infected) and consist of multiple parasite taxa. By bioinformatically separating host and parasite transcripts and assigning them to the haemosporidian genus of origin, we found that transcriptomes detected ~23% more total parasite infections across all samples than were identified using microscopy and DNA barcoding. For single‐species infections, we obtained data for >1,300 loci from samples with as low as 0.03% parasitaemia, with the number of loci increasing with infection intensity. In total, we provide data for 1,502 single‐copy orthologous loci from a phylogenetically diverse set of 33 haemosporidian mitochondrial lineages. The metatranscriptomic approach described here has the potential to accelerate ecological and evolutionary research on haemosporidians and other diverse parasites.

     
    more » « less
  2. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.

    Results

    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, or single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.

    Conclusion

    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 athttps://github.com/AnantharamanLab/METABOLIC.

     
    more » « less
  3. Fraser, Claire M. (Ed.)
    ABSTRACT

    Metagenomics is a powerful method for interpreting the ecological roles and physiological capabilities of mixed microbial communities. Yet, many tools for processing metagenomic data are neither designed to consider eukaryotes nor are they built for an increasing amount of sequence data. EukHeist is an automated pipeline to retrieve eukaryotic and prokaryotic metagenome-assembled genomes (MAGs) from large-scale metagenomic sequence data sets. We developed the EukHeist workflow to specifically process large amounts of both metagenomic and/or metatranscriptomic sequence data in an automated and reproducible fashion. Here, we applied EukHeist to the large-size fraction data (0.8–2,000 µm) from Tara Oceans to recover both eukaryotic and prokaryotic MAGs, which we refer to as TOPAZ (Tara Oceans Particle-Associated MAGs). The TOPAZ MAGs consisted of >900 environmentally relevant eukaryotic MAGs and >4,000 bacterial and archaeal MAGs. The bacterial and archaeal TOPAZ MAGs expand upon the phylogenetic diversity of likely particle- and host-associated taxa. We use these MAGs to demonstrate an approach to infer the putative trophic mode of the recovered eukaryotic MAGs. We also identify ecological cohorts of co-occurring MAGs, which are driven by specific environmental factors and putative host-microbe associations. These data together add to a number of growing resources of environmentally relevant eukaryotic genomic information. Complementary and expanded databases of MAGs, such as those provided through scalable pipelines like EukHeist, stand to advance our understanding of eukaryotic diversity through increased coverage of genomic representatives across the tree of life.

    IMPORTANCE

    Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers’ efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.

     
    more » « less
  4. Abstract

    The trace metal iron (Fe) controls the diversity and activity of phytoplankton across the surface oceans, a paradigm established through decades of in situ and mesocosm experimental studies. Despite widespread Fe-limitation within high-nutrient, low chlorophyll (HNLC) waters, significant contributions of the cyanobacterium Synechococcus to the phytoplankton stock can be found. Correlations among differing strains of Synechococcus across different Fe-regimes have suggested the existence of Fe-adapted ecotypes. However, experimental evidence of high- versus low-Fe adapted strains of Synechococcus is lacking, and so we investigated the transcriptional responses of microbial communities inhabiting the HNLC, sub-Antarctic region of the Southern Ocean during the Spring of 2018. Analysis of metatranscriptomes generated from on-deck incubation experiments reflecting a gradient of Fe-availabilities reveal transcriptomic signatures indicative of co-occurring Synechococcus ecotypes adapted to differing Fe-regimes. Functional analyses comparing low-Fe and high-Fe conditions point to various Fe-acquisition mechanisms that may allow persistence of low-Fe adapted Synechococcus under Fe-limitation. Comparison of in situ surface conditions to the Fe-titrations indicate ecological relevance of these mechanisms as well as persistence of both putative ecotypes within this region. This Fe-titration approach, combined with transcriptomics, highlights the short-term responses of the in situ phytoplankton community to Fe-availability that are often overlooked by examining genomic content or bulk physiological responses alone. These findings expand our knowledge about how phytoplankton in HNLC Southern Ocean waters adapt and respond to changing Fe supply.

     
    more » « less
  5. Abstract

    Corals and sponges harbor diverse microbial communities that are integral to the functioning of the host. While the taxonomic diversity of their microbiomes has been well-established for corals and sponges, their functional roles are less well-understood. It is unclear if the similarities of symbiosis in an invertebrate host would result in functionally similar microbiomes, or if differences in host phylogeny and environmentally driven microhabitats within each host would shape functionally distinct communities. Here we addressed this question, using metatranscriptomic and 16S rRNA gene profiling techniques to compare the microbiomes of two host organisms from different phyla. Our results indicate functional similarity in carbon, nitrogen, and sulfur assimilation, and aerobic nitrogen cycling. Additionally, there were few statistical differences in pathway coverage or abundance between the two hosts. For example, we observed higher coverage of phosphonate and siderophore metabolic pathways in the star coral,Montastraea cavernosa, while there was higher coverage of chloroalkane metabolism in the giant barrel sponge,Xestospongia muta. Higher abundance of genes associated with carbon fixation pathways was also observed inM. cavernosa, while inX. mutathere was higher abundance of fatty acid metabolic pathways. Metagenomic predictions based on 16S rRNA gene profiling analysis were similar, and there was high correlation between the metatranscriptome and metagenome predictions for both hosts. Our results highlight several metabolic pathways that exhibit functional similarity in these coral and sponge microbiomes despite the taxonomic differences between the two microbiomes, as well as potential specialization of some microbially based metabolism within each host.

     
    more » « less