skip to main content


Title: OGUs enable effective, phylogeny-aware analysis of even shallow metagenome community structures
We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Importance Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA amplicon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies.  more » « less
Award ID(s):
2038509
NSF-PAR ID:
10335825
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; « less
Date Published:
Journal Name:
bioRxiv
ISSN:
2692-8205
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract With advances in DNA sequencing and miniaturized molecular biology workflows, rapid and affordable sequencing of single-cell genomes has become a reality. Compared to 16S rRNA gene surveys and shotgun metagenomics, large-scale application of single-cell genomics to whole microbial communities provides an integrated snapshot of community composition and function, directly links mobile elements to their hosts, and enables analysis of population heterogeneity of the dominant community members. To that end, we sequenced nearly 500 single-cell genomes from a low diversity hot spring sediment sample from Dewar Creek, British Columbia, and compared this approach to 16S rRNA gene amplicon and shotgun metagenomics applied to the same sample. We found that the broad taxonomic profiles were similar across the three sequencing approaches, though several lineages were missing from the 16S rRNA gene amplicon dataset, likely the result of primer mismatches. At the functional level, we detected a large array of mobile genetic elements present in the single-cell genomes but absent from the corresponding same species metagenome-assembled genomes. Moreover, we performed a single-cell population genomic analysis of the three most abundant community members, revealing differences in population structure based on mutation and recombination profiles. While the average pairwise nucleotide identities were similar across the dominant species-level lineages, we observed differences in the extent of recombination between these dominant populations. Most intriguingly, the creek’s Hydrogenobacter sp . population appeared to be so recombinogenic that it more closely resembled a sexual species than a clonally evolving microbe. Together, this work demonstrates that a randomized single-cell approach can be useful for the exploration of previously uncultivated microbes from community composition to population structure. 
    more » « less
  2. Campbell, Barbara J. (Ed.)
    Marine invertebrate microbiomes play important roles in diverse host and ecological processes. However, a mechanistic understanding of host-microbe interactions is currently available for a small number of model organisms. Here, an integrated taxonomic and functional analysis of the microbiome of the eastern oyster, Crassostrea virginica, was performed using 16S rRNA gene-based amplicon profiling, shotgun metagenomics, and genome-scale metabolic reconstruction. Relatively high variability of the microbiome was observed across individual oysters and among different tissue types. Specifically, a significantly higher alpha diversity was observed in the inner shell than in the gut, gill, mantle, and pallial fluid samples, and a distinct microbiome composition was revealed in the gut compared to other tissues examined in this study. Targeted metagenomic sequencing of the gut microbiota led to further characterization of a dominant bacterial taxon, the class Mollicutes, which was captured by the reconstruction of a metagenome-assembled genome (MAG). Genome-scale metabolic reconstruction of the oyster Mollicutes MAG revealed a reduced set of metabolic functions and a high reliance on the uptake of host-derived nutrients. A chitin degradation and an arginine deiminase pathway were unique to the MAG compared to closely related genomes of Mollicutes isolates, indicating distinct mechanisms of carbon and energy acquisition by the oyster-associated Mollicutes. A systematic reanalysis of public eastern oyster-derived microbiome data revealed a high prevalence of the Mollicutes among adult oyster guts and a significantly lower relative abundance of the Mollicutes in oyster larvae and adult oyster biodeposits. 
    more » « less
  3. Gilbert, Jack A. (Ed.)
    ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n  = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data. 
    more » « less
  4. ProkaryoticNostoc, one of the world's most conspicuous and widespread algal genera (similar to eukaryotic algae, plants, and animals) is known to support a microbiome that influences host ecological roles. Past taxonomic characterizations of surface microbiota (epimicrobiota) of free‐livingNostocsampled from freshwater systems employed 16S rRNA genes, typically amplicons. We compared taxa identified from 16S, 18S, 23S, and 28S rRNA gene sequences filtered from shotgun metagenomic sequence and used microscopy to illuminate epimicrobiota diversity forNostocsampled from a wetland in the northern Chilean Altiplano. Phylogenetic analysis and rRNA gene sequence abundance estimates indicated that the host was related toNostoc punctiformePCC 73102. Epimicrobiota were inferred to include 18 epicyanobacterial genera or uncultured taxa, six epieukaryotic algal genera, and 66 anoxygenic bacterial genera, all having average genomic coverage ≥90X. The epicyanobacteriaGeitlerinemia,Oscillatoria,Phormidium, and an uncultured taxon were detected only by 16S rRNA gene;GloeobacterandPseudanabaenawere detected using 16S and 23S; andPhormididesmis,Neosynechococcus,Symphothece,Aphanizomenon,Nodularia,Spirulina,Nodosilinea,Synechococcus,Cyanobium, andAnabaena(the latter corroborated by microscopy), plus two uncultured cyanobacterial taxa (JSC12, O77) were detected only by 23S rRNA gene sequences. Three chlamydomonad and two heterotrophic stramenopiles genera were inferred from 18S; the streptophyte green algaChaetosphaeridium globosumwas detected by microscopy and 28S rRNA genes, but not 18S rRNA genes. Overall, >60% of epimicrobial taxa were detected by markers other than 16S rRNA genes. Some algal taxa observed microscopically were not detected from sequence data. Results indicate that multiple taxonomic markers derived from metagenomic sequence data and microscopy increase epimicrobiota detection.

     
    more » « less
  5. Microbiome research is a thriving field focused on characterizing the composition and functionality of microbial populations or microbiomes from a wide array of ecological niches. Microbiomes occupy living organisms, soil, the atmosphere, and bodies of water and exist in moderate and extreme climates. Understanding the intractable microbial universes in various environments is challenging and potentially rewarding to humankind. Historically, elucidating pathogenic microbes and their impact on host species has dominated microbiome-based studies. Moreover, a tiny percentage of microbes can be cultured using classical culturing methods. With advancements in high throughput experimentation and computational tools derived from microbial ecology, there is a driving force to gain insight into the entire microbial consortium from various environmental and biological locations. Metagenomics, the study of all the microbial genomes in a sample using sequencing techniques (e.g., 16s rRNA amplicon sequencing and shotgun sequencing), has so far dominated the types of investigations conducted in the field of microbiome research. More recently, however, researchers are becoming increasingly interested in better understanding the complex microbe-associated molecular network and specific protein and metabolite functions associated with microbial genetic potential. Metaproteomic, meta transcriptomics, and metabolomics are three potent methods to accumulate information about microbial proteins, messenger RNA, and metabolites in a microbial community. These methods are currently being applied in laboratory settings to address our general lack of understanding of microbe-microbe interactions and microbe-environment interactions. 
    more » « less