skip to main content

This content will become publicly available on December 1, 2023

Title: MetaPop: a pipeline for macro- and microdiversity analyses and visualization of microbial and viral metagenome-derived populations
Abstract Background Microbes and their viruses are hidden engines driving Earth’s ecosystems from the oceans and soils to humans and bioreactors. Though gene marker approaches can now be complemented by genome-resolved studies of inter-(macrodiversity) and intra-(microdiversity) population variation, analytical tools to do so remain scattered or under-developed. Results Here, we introduce MetaPop, an open-source bioinformatic pipeline that provides a single interface to analyze and visualize microbial and viral community metagenomes at both the macro - and microdiversity levels. Macrodiversity estimates include population abundances and α- and β-diversity. Microdiversity calculations include identification of single nucleotide polymorphisms, novel codon-constrained linkage of SNPs, nucleotide diversity ( π and θ ), and selective pressures (pN/pS and Tajima’s D ) within and fixation indices ( F ST ) between populations. MetaPop will also identify genes with distinct codon usage. Following rigorous validation, we applied MetaPop to the gut viromes of autistic children that underwent fecal microbiota transfers and their neurotypical peers. The macrodiversity results confirmed our prior findings for viral populations (microbial shotgun metagenomes were not available) that diversity did not significantly differ between autistic and neurotypical children. However, by also quantifying microdiversity, MetaPop revealed lower average viral nucleotide diversity ( π ) in autistic more » children. Analysis of the percentage of genomes detected under positive selection was also lower among autistic children, suggesting that higher viral π in neurotypical children may be beneficial because it allows populations to better “bet hedge” in changing environments. Further, comparisons of microdiversity pre- and post-FMT in autistic children revealed that the delivery FMT method (oral versus rectal) may influence viral activity and engraftment of microdiverse viral populations, with children who received their FMT rectally having higher microdiversity post-FMT. Overall, these results show that analyses at the macro level alone can miss important biological differences. Conclusions These findings suggest that standardized population and genetic variation analyses will be invaluable for maximizing biological inference, and MetaPop provides a convenient tool package to explore the dual impact of macro - and microdiversity across microbial communities. « less
Authors:
; ; ; ; ; ;
Award ID(s):
1759831 1759874
Publication Date:
NSF-PAR ID:
10354254
Journal Name:
Microbiome
Volume:
10
Issue:
1
ISSN:
2049-2618
Sponsoring Org:
National Science Foundation
More Like this
  1. Coexisting microbial cells of the same species often exhibit genetic variation that can affect phenotypes ranging from nutrient preference to pathogenicity. Here we present inStrain, a program that uses metagenomic paired reads to profile intra-population genetic diversity (microdiversity) across whole genomes and compares microbial populations in a microdiversity-aware man- ner, greatly increasing the accuracy of genomic comparisons when benchmarked against existing methods. We use inStrain to profile >1,000 fecal metagenomes from newborn premature infants and find that siblings share significantly more strains than unrelated infants, although identical twins share no more strains than fraternal siblings. Infants born by cesarean section har- bor Klebsiella with significantly higher nucleotide diversity than infants delivered vaginally, potentially reflecting acquisition from hospital rather than maternal microbiomes. Genomic loci that show diversity in individual infants include variants found between other infants, possibly reflecting inoculation from diverse hospital-associated sources. inStrain can be applied to any metagenomic dataset for microdiversity analysis and rigorous strain comparison.
  2. The extent and ecological significance of intraspecific diversity within marine microbial populations is still poorly understood, and it remains unclear if such strain-level microdiversity will affect fitness and persistence in a rapidly changing ocean environment. In this study, we cultured 11 sympatric strains of the ubiquitous marine picocyanobacterium Synechococcus isolated from a Narragansett Bay (Rhode Island, USA) phytoplankton community thermal selection experiment. Despite all 11 isolates being highly similar (with average nucleotide identities of >99.9%, with 98.6-100% of the genome aligning), thermal performance curves revealed selection at warm and cool temperatures had subdivided the initial population into thermotypes with pronounced differences in maximum growth temperatures. Within the fine-scale genetic diversity that did exist within this population, the two divergent thermal ecotypes differed at a locus containing genes for the phycobilisome antenna complex. Our study demonstrates that present-day marine microbial populations can contain microdiversity in the form of cryptic but environmentally-relevant thermotypes that may increase their resilience to future rising temperatures.
  3. Hydrogenotrophic methanogens are ubiquitous chemoautotrophic archaea inhabiting globally distributed deep-sea hydrothermal vent ecosystems and associated subseafloor niches within the rocky subseafloor, yet little is known about how they adapt and diversify in these habitats. To determine genomic variation and selection pressure within methanogenic populations at vents, we examined five Methanothermococcus single cell amplified genomes (SAGs) in conjunction with 15 metagenomes and 10 metatranscriptomes from venting fluids at two geochemically distinct hydrothermal vent fields on the Mid-Cayman Rise in the Caribbean Sea. We observed that some Methanothermococcus lineages and their transcripts were more abundant than others in individual vent sites, indicating differential fitness among lineages. The relative abundances of lineages represented by SAGs in each of the samples matched phylogenetic relationships based on single-copy universal genes, and genes related to nitrogen fixation and the CRISPR/Cas immune system were among those differentiating the clades. Lineages possessing these genes were less abundant than those missing that genomic region. Overall, patterns in nucleotide variation indicated that the population dynamics of Methanothermococcus were not governed by clonal expansions or selective sweeps, at least in the habitats and sampling times included in this study. Together, our results show that although specific lineages of Methanothermococcus co-exist inmore »these habitats, some outcompete others, and possession of accessory metabolic functions does not necessarily provide a fitness advantage in these habitats in all conditions. This work highlights the power of combining single-cell, metagenomic, and metatranscriptomic datasets to determine how evolution shapes microbial abundance and diversity in hydrothermal vent ecosystems.« less
  4. Background Viruses influence global patterns of microbial diversity and nutrient cycles. Though viral metagenomics (viromics), specifically targeting dsDNA viruses, has been critical for revealing viral roles across diverse ecosystems, its analyses differ in many ways from those used for microbes. To date, viromics benchmarking has covered read pre-processing, assembly, relative abundance, read mapping thresholds and diversity estimation, but other steps would benefit from benchmarking and standardization. Here we use in silico-generated datasets and an extensive literature survey to evaluate and highlight how dataset composition (i.e., viromes vs bulk metagenomes) and assembly fragmentation impact (i) viral contig identification tool, (ii) virus taxonomic classification, and (iii) identification and curation of auxiliary metabolic genes (AMGs). Results The in silico benchmarking of five commonly used virus identification tools show that gene-content-based tools consistently performed well for long (≥3 kbp) contigs, while k -mer- and blast-based tools were uniquely able to detect viruses from short (≤3 kbp) contigs. Notably, however, the performance increase of k -mer- and blast-based tools for short contigs was obtained at the cost of increased false positives (sometimes up to ∼5% for virome and ∼75% bulk samples), particularly when eukaryotic or mobile genetic element sequences were included in the test datasets.more »For viral classification, variously sized genome fragments were assessed using gene-sharing network analytics to quantify drop-offs in taxonomic assignments, which revealed correct assignations ranging from ∼95% (whole genomes) down to ∼80% (3 kbp sized genome fragments). A similar trend was also observed for other viral classification tools such as VPF-class, ViPTree and VIRIDIC, suggesting that caution is warranted when classifying short genome fragments and not full genomes. Finally, we highlight how fragmented assemblies can lead to erroneous identification of AMGs and outline a best-practices workflow to curate candidate AMGs in viral genomes assembled from metagenomes. Conclusion Together, these benchmarking experiments and annotation guidelines should aid researchers seeking to best detect, classify, and characterize the myriad viruses ‘hidden’ in diverse sequence datasets.« less
  5. ABSTRACT Viral infection exerts selection pressure on marine microbes, as virus-induced cell lysis causes 20 to 50% of cell mortality, resulting in fluxes of biomass into oceanic dissolved organic matter. Archaeal and bacterial populations can defend against viral infection using the clustered regularly interspaced short palindromic repeat (CRISPR)-associated (Cas) system, which relies on specific matching between a spacer sequence and a viral gene. If a CRISPR spacer match to any gene within a viral genome is equally effective in preventing lysis, no viral genes should be preferentially matched by CRISPR spacers. However, if there are differences in effectiveness, certain viral genes may demonstrate a greater frequency of CRISPR spacer matches. Indeed, homology search analyses of bacterioplankton CRISPR spacer sequences against virioplankton sequences revealed preferential matching of replication proteins, nucleic acid binding proteins, and viral structural proteins. Positive selection pressure for effective viral defense is one parsimonious explanation for these observations. CRISPR spacers from virioplankton metagenomes preferentially matched methyltransferase and phage integrase genes within virioplankton sequences. These virioplankton CRISPR spacers may assist infected host cells in defending against competing phage. Analyses also revealed that half of the spacer-matched viral genes were unknown, some genes matched several spacers, and some spacers matchedmore »multiple genes, a many-to-many relationship. Thus, CRISPR spacer matching may be an evolutionary algorithm, agnostically identifying those genes under stringent selection pressure for sustaining viral infection and lysis. Investigating this subset of viral genes could reveal those genetic mechanisms essential to virus-host interactions and provide new technologies for optimizing CRISPR defense in beneficial microbes. IMPORTANCE The CRISPR-Cas system is one means by which bacterial and archaeal populations defend against viral infection which causes 20 to 50% of cell mortality in the ocean. We tested the hypothesis that certain viral genes are preferentially targeted for the initial attack of the CRISPR-Cas system on a viral genome. Using CASC, a pipeline for CRISPR spacer discovery, and metagenome data from oceanic microbes and viruses, we found a clear subset of viral genes with high match frequencies to CRISPR spacers. Moreover, we observed a many-to-many relationship of spacers and viral genes. These high-match viral genes were involved in nucleotide metabolism, DNA methylation, and viral structure. It is possible that CRISPR spacer matching is an evolutionary algorithm pointing to those viral genes most important to sustaining infection and lysis. Studying these genes may advance the understanding of virus-host interactions in nature and provide new technologies for leveraging CRISPR-Cas systems in beneficial microbes.« less