skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2026

Title: Analysis of metagenomic data
Metagenomics has revolutionized our understanding of microbial communities, offering unprecedented insights into their genetic and functional diversity across Earth’s diverse ecosystems. Beyond their roles as environmental constituents, microbiomes act as symbionts, profoundly influencing the health and function of their host organisms. Given the inherent complexity of these communities and the diverse environments where they reside, the components of a metagenomics study must be carefully tailored to yield accurate results that are representative of the populations of interest. This Primer examines the methodological advancements and current practices that have shaped the field, from initial stages of sample collection and DNA extraction to the advanced bioinformatics tools employed for data analysis, with a particular focus on the profound impact of next-generation sequencing on the scale and accuracy of metagenomics studies. We critically assess the challenges and limitations inherent in metagenomics experimentation, available technologies and computational analysis methods. Beyond technical methodologies, we explore the application of metagenomics across various domains, including human health, agriculture and environmental monitoring. Looking ahead, we advocate for the development of more robust computational frameworks and enhanced interdisciplinary collaborations. This Primer serves as a comprehensive guide for advancing the precision and applicability of metagenomic studies, positioning them to address the complexities of microbial ecology and their broader implications for human health and environmental sustainability.  more » « less
Award ID(s):
2316223
PAR ID:
10599242
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Nature
Date Published:
Journal Name:
Nature Reviews Methods Primers
Volume:
5
Issue:
1
ISSN:
2662-8449
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gilbert, Jack A. (Ed.)
    ABSTRACT Small subunit rRNA (SSU rRNA) amplicon sequencing can quantitatively and comprehensively profile natural microbiomes, representing a critically important tool for studying diverse global ecosystems. However, results will only be accurate if PCR primers perfectly match the rRNA of all organisms present. To evaluate how well marine microorganisms across all 3 domains are detected by this method, we compared commonly used primers with >300 million rRNA gene sequences retrieved from globally distributed marine metagenomes. The best-performing primers compared to 16S rRNA of bacteria and archaea were 515Y/926R and 515Y/806RB, which perfectly matched over 96% of all sequences. Considering cyanobacterial and chloroplast 16S rRNA, 515Y/926R had the highest coverage (99%), making this set ideal for quantifying marine primary producers. For eukaryotic 18S rRNA sequences, 515Y/926R also performed best (88%), followed by V4R/V4RB (18S rRNA specific; 82%)—demonstrating that the 515Y/926R combination performs best overall for all 3 domains. Using Atlantic and Pacific Ocean samples, we demonstrate high correspondence between 515Y/926R amplicon abundances (generated for this study) and metagenomic 16S rRNA (median R 2 = 0.98, n  = 272), indicating amplicons can produce equally accurate community composition data compared with shotgun metagenomics. Our analysis also revealed that expected performance of all primer sets could be improved with minor modifications, pointing toward a nearly completely universal primer set that could accurately quantify biogeochemically important taxa in ecosystems ranging from the deep sea to the surface. In addition, our reproducible bioinformatic workflow can guide microbiome researchers studying different ecosystems or human health to similarly improve existing primers and generate more accurate quantitative amplicon data. IMPORTANCE PCR amplification and sequencing of marker genes is a low-cost technique for monitoring prokaryotic and eukaryotic microbial communities across space and time but will work optimally only if environmental organisms match PCR primer sequences exactly. In this study, we evaluated how well primers match globally distributed short-read oceanic metagenomes. Our results demonstrate that primer sets vary widely in performance, and that at least for marine systems, rRNA amplicon data from some primers lack significant biases compared to metagenomes. We also show that it is theoretically possible to create a nearly universal primer set for diverse saline environments by defining a specific mixture of a few dozen oligonucleotides, and present a software pipeline that can guide rational design of primers for any environment with available meta’omic data. 
    more » « less
  2. We introduce Operational Genomic Unit (OGU), a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent from taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldomly applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in one synthetic and two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome datasets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project dataset, and more accurate prediction of human age by the gut microbiomes in the Finnish population. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate OGU adoption in future metagenomics studies. Importance Shotgun metagenomics is a powerful, yet computationally challenging, technique compared to 16S rRNA gene amplicon sequencing for decoding the composition and structure of microbial communities. However, current analyses of metagenomic data are primarily based on taxonomic classification, which is limited in feature resolution compared to 16S rRNA amplicon sequence variant analysis. To solve these challenges, we introduce Operational Genomic Units (OGUs), which are the individual reference genomes derived from sequence alignment results, without further assigning them taxonomy. The OGU method advances current read-based metagenomics in two dimensions: (i) providing maximal resolution of community composition while (ii) permitting use of phylogeny-aware tools. Our analysis of real-world datasets shows several advantages over currently adopted metagenomic analysis methods and the finest-grained 16S rRNA analysis methods in predicting biological traits. We thus propose the adoption of OGU as standard practice in metagenomic studies. 
    more » « less
  3. Kormas, Konstantinos Aristomenis (Ed.)
    ABSTRACT The study of the mammalian microbiome serves as a critical tool for understanding host-microbial diversity and coevolution and the impact of bacterial communities on host health. While studies of specific microbial systems (e.g., in the human gut) have rapidly increased, large knowledge gaps remain, hindering our understanding of the determinants and levels of variation in microbiomes across multiple body sites and host species. Here, we compare microbiome community compositions from eight distinct body sites among 17 phylogenetically diverse species of nonhuman primates (NHPs), representing the largest comparative study of microbial diversity across primate host species and body sites. Analysis of 898 samples predominantly acquired in the wild demonstrated that oral microbiomes were unique in their clustering, with distinctive divergence from all other body site microbiomes. In contrast, all other body site microbiomes clustered principally by host species and differentiated by body site within host species. These results highlight two key findings: (i) the oral microbiome is unique compared to all other body site microbiomes and conserved among diverse nonhuman primates, despite their considerable dietary and phylogenetic differences, and (ii) assessments of the determinants of host-microbial diversity are relative to the level of the comparison (i.e., intra-/inter-body site, -host species, and -individual), emphasizing the need for broader comparative microbial analyses across diverse hosts to further elucidate host-microbial dynamics, evolutionary and biological patterns of variation, and implications for human-microbial coevolution. IMPORTANCE The microbiome is critical to host health and disease, but much remains unknown about the determinants, levels, and evolution of host-microbial diversity. The relationship between hosts and their associated microbes is complex. Most studies to date have focused on the gut microbiome; however, large gaps remain in our understanding of host-microbial diversity, coevolution, and levels of variation in microbiomes across multiple body sites and host species. To better understand the patterns of variation and evolutionary context of host-microbial communities, we conducted one of the largest comparative studies to date, which indicated that the oral microbiome was distinct from the microbiomes of all other body sites and convergent across host species, suggesting conserved niche specialization within the Primates order. We also show the importance of host species differences in shaping the microbiome within specific body sites. This large, comparative study contributes valuable information on key patterns of variation among hosts and body sites, with implications for understanding host-microbial dynamics and human-microbial coevolution. 
    more » « less
  4. ABSTRACT Little is known about the public health risks associated with natural creek sediments that are affected by runoff and fecal pollution from agricultural and livestock practices. For instance, the persistence of foodborne pathogens such as Shiga toxin-producing Escherichia coli (STEC) originating from these practices remains poorly quantified. Towards closing these knowledge gaps, the water-sediment interface of two creeks in the Salinas River Valley of California was sampled over a 9-month period using metagenomics and traditional culture-based tests for STEC. Our results revealed that these sediment communities are extremely diverse and have functional and taxonomic diversity comparable to that observed in soils. With our sequencing effort (∼4 Gbp per library), we were unable to detect any pathogenic E. coli in the metagenomes of 11 samples that had tested positive using culture-based methods, apparently due to relatively low abundance. Furthermore, there were no significant differences in the abundance of human- or cow-specific gut microbiome sequences in the downstream impacted sites compared to that in upstream more pristine (control) sites, indicating natural dilution of anthropogenic inputs. Notably, the high number of metagenomic reads carrying antibiotic resistance genes (ARGs) found in all samples was significantly higher than ARG reads in other available freshwater and soil metagenomes, suggesting that these communities may be natural reservoirs of ARGs. The work presented here should serve as a guide for sampling volumes, amount of sequencing to apply, and what bioinformatics analyses to perform when using metagenomics for public health risk studies of environmental samples such as sediments. IMPORTANCE Current agricultural and livestock practices contribute to fecal contamination in the environment and the spread of food- and waterborne disease and antibiotic resistance genes (ARGs). Traditionally, the level of pollution and risk to public health are assessed by culture-based tests for the intestinal bacterium Escherichia coli . However, the accuracy of these traditional methods (e.g., low accuracy in quantification, and false-positive signal when PCR based) and their suitability for sediments remain unclear. We collected sediments for a time series metagenomics study from one of the most highly productive agricultural regions in the United States in order to assess how agricultural runoff affects the native microbial communities and if the presence of Shiga toxin-producing Escherichia coli (STEC) in sediment samples can be detected directly by sequencing. Our study provided important information on the potential for using metagenomics as a tool for assessment of public health risk in natural environments. 
    more » « less
  5. null (Ed.)
    The explosion of microbiome analyses has helped identify individual microorganisms and microbial communities driving human health and disease, but how these communities function is still an open question. For example, the role for the incredibly complex metabolic interactions among microbial species cannot easily be resolved by current experimental approaches such as 16S rRNA gene sequencing, metagenomics and/or metabolomics. Resolving such metabolic interactions is particularly challenging in the context of polymicrobial communities where metabolite exchange has been reported to impact key bacterial traits such as virulence and antibiotic treatment efficacy. As novel approaches are needed to pinpoint microbial determinants responsible for impacting community function in the context of human health and to facilitate the development of novel anti-infective and antimicrobial drugs, here we review, from the viewpoint of experimentalists, the latest advances in metabolic modeling, a computational method capable of predicting metabolic capabilities and interactions from individual microorganisms to complex ecological systems. We use selected examples from the literature to illustrate how metabolic modeling has been utilized, in combination with experiments, to better understand microbial community function. Finally, we propose how such combined, cross-disciplinary efforts can be utilized to drive laboratory work and drug discovery moving forward. 
    more » « less