skip to main content

Title: inStrain profiles population microdiversity from metagenomic data and sensitively detects shared microbial strains
Coexisting microbial cells of the same species often exhibit genetic variation that can affect phenotypes ranging from nutrient preference to pathogenicity. Here we present inStrain, a program that uses metagenomic paired reads to profile intra-population genetic diversity (microdiversity) across whole genomes and compares microbial populations in a microdiversity-aware man- ner, greatly increasing the accuracy of genomic comparisons when benchmarked against existing methods. We use inStrain to profile >1,000 fecal metagenomes from newborn premature infants and find that siblings share significantly more strains than unrelated infants, although identical twins share no more strains than fraternal siblings. Infants born by cesarean section har- bor Klebsiella with significantly higher nucleotide diversity than infants delivered vaginally, potentially reflecting acquisition from hospital rather than maternal microbiomes. Genomic loci that show diversity in individual infants include variants found between other infants, possibly reflecting inoculation from diverse hospital-associated sources. inStrain can be applied to any metagenomic dataset for microdiversity analysis and rigorous strain comparison.
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Nature Biotechnology
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Diversification can generate genomic and phenotypic strain-level diversity within microbial species. This microdiversity is widely recognized in populations, but the community-level consequences of microbial strain-level diversity are poorly characterized. Using the cheese rind model system, we tested whether strain diversity across microbiomes from distinct geographic regions impacts assembly dynamics and functional outputs. We first isolated the same three bacterial species ( Staphylococcus equorum , Brevibacterium auranticum , and Brachybacterium alimentarium ) from nine cheeses produced in different regions of the United States and Europe to construct nine synthetic microbial communities consisting of distinct strains of the same three bacterial species. Comparative genomics identified distinct phylogenetic clusters and significant variation in genome content across the nine synthetic communities. When we assembled each synthetic community with initially identical compositions, community structure diverged over time, resulting in communities with different dominant taxa. The taxonomically identical communities showed differing responses to abiotic (high salt) and biotic (the fungus Penicillium ) perturbations, with some communities showing no response and others substantially shifting in composition. Functional differences were also observed across the nine communities, with significant variation in pigment production (light yellow to orange) and in composition of volatile organic compound profiles emitted from themore »rinds (nutty to sulfury). IMPORTANCE Our work demonstrated that the specific microbial strains used to construct a microbiome could impact the species composition, perturbation responses, and functional outputs of that system. These findings suggest that 16S rRNA gene taxonomic profiles alone may have limited potential to predict the dynamics of microbial communities because they usually do not capture strain-level diversity. Observations from our synthetic communities also suggest that strain-level diversity has the potential to drive variability in the aesthetics and quality of surface-ripened cheeses.« less
  2. Pettigrew, Melinda M. (Ed.)
    ABSTRACT Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and nearly complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the United States or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside California, indicating that wastewater sequencing can provide evidence for recent introductionsmore »of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context.« less
  3. Abstract Background Metagenomic data can be used to profile high-importance genes within microbiomes. However, current metagenomic workflows produce data that suffer from low sensitivity and an inability to accurately reconstruct partial or full genomes, particularly those in low abundance. These limitations preclude colocalization analysis, i.e., characterizing the genomic context of genes and functions within a metagenomic sample. Genomic context is especially crucial for functions associated with horizontal gene transfer (HGT) via mobile genetic elements (MGEs), for example antimicrobial resistance (AMR). To overcome this current limitation of metagenomics, we present a method for comprehensive and accurate reconstruction of antimicrobial resistance genes (ARGs) and MGEs from metagenomic DNA, termed t arget- e nriched l ong-read seq uencing (TELSeq). Results Using technical replicates of diverse sample types, we compared TELSeq performance to that of non-enriched PacBio and short-read Illumina sequencing. TELSeq achieved much higher ARG recovery (>1,000-fold) and sensitivity than the other methods across diverse metagenomes, revealing an extensive resistome profile comprising many low-abundance ARGs, including some with public health importance. Using the long reads generated by TELSeq, we identified numerous MGEs and cargo genes flanking the low-abundance ARGs, indicating that these ARGs could be transferred across bacterial taxa via HGT. Conclusions TELSeq can providemore »a nuanced view of the genomic context of microbial resistomes and thus has wide-ranging applications in public, animal, and human health, as well as environmental surveillance and monitoring of AMR. Thus, this technique represents a fundamental advancement for microbiome research and application.« less
  4. The extent and ecological significance of intraspecific diversity within marine microbial populations is still poorly understood, and it remains unclear if such strain-level microdiversity will affect fitness and persistence in a rapidly changing ocean environment. In this study, we cultured 11 sympatric strains of the ubiquitous marine picocyanobacterium Synechococcus isolated from a Narragansett Bay (Rhode Island, USA) phytoplankton community thermal selection experiment. Despite all 11 isolates being highly similar (with average nucleotide identities of >99.9%, with 98.6-100% of the genome aligning), thermal performance curves revealed selection at warm and cool temperatures had subdivided the initial population into thermotypes with pronounced differences in maximum growth temperatures. Within the fine-scale genetic diversity that did exist within this population, the two divergent thermal ecotypes differed at a locus containing genes for the phycobilisome antenna complex. Our study demonstrates that present-day marine microbial populations can contain microdiversity in the form of cryptic but environmentally-relevant thermotypes that may increase their resilience to future rising temperatures.
  5. Abstract Background Microbes and their viruses are hidden engines driving Earth’s ecosystems from the oceans and soils to humans and bioreactors. Though gene marker approaches can now be complemented by genome-resolved studies of inter-(macrodiversity) and intra-(microdiversity) population variation, analytical tools to do so remain scattered or under-developed. Results Here, we introduce MetaPop, an open-source bioinformatic pipeline that provides a single interface to analyze and visualize microbial and viral community metagenomes at both the macro - and microdiversity levels. Macrodiversity estimates include population abundances and α- and β-diversity. Microdiversity calculations include identification of single nucleotide polymorphisms, novel codon-constrained linkage of SNPs, nucleotide diversity ( π and θ ), and selective pressures (pN/pS and Tajima’s D ) within and fixation indices ( F ST ) between populations. MetaPop will also identify genes with distinct codon usage. Following rigorous validation, we applied MetaPop to the gut viromes of autistic children that underwent fecal microbiota transfers and their neurotypical peers. The macrodiversity results confirmed our prior findings for viral populations (microbial shotgun metagenomes were not available) that diversity did not significantly differ between autistic and neurotypical children. However, by also quantifying microdiversity, MetaPop revealed lower average viral nucleotide diversity ( π ) in autisticmore »children. Analysis of the percentage of genomes detected under positive selection was also lower among autistic children, suggesting that higher viral π in neurotypical children may be beneficial because it allows populations to better “bet hedge” in changing environments. Further, comparisons of microdiversity pre- and post-FMT in autistic children revealed that the delivery FMT method (oral versus rectal) may influence viral activity and engraftment of microdiverse viral populations, with children who received their FMT rectally having higher microdiversity post-FMT. Overall, these results show that analyses at the macro level alone can miss important biological differences. Conclusions These findings suggest that standardized population and genetic variation analyses will be invaluable for maximizing biological inference, and MetaPop provides a convenient tool package to explore the dual impact of macro - and microdiversity across microbial communities.« less