skip to main content

Title: Ecology and molecular targets of hypermutation in the global microbiome

Changes in the sequence of an organism’s genome, i.e., mutations, are the raw material of evolution. The frequency and location of mutations can be constrained by specific molecular mechanisms, such as diversity-generating retroelements (DGRs). DGRs have been characterized from cultivated bacteria and bacteriophages, and perform error-prone reverse transcription leading to mutations being introduced in specific target genes. DGR loci were also identified in several metagenomes, but the ecological roles and evolutionary drivers of these DGRs remain poorly understood. Here, we analyze a dataset of >30,000 DGRs from public metagenomes, establish six major lineages of DGRs including three primarily encoded by phages and seemingly used to diversify host attachment proteins, and demonstrate that DGRs are broadly active and responsible for >10% of all amino acid changes in some organisms. Overall, these results highlight the constraints under which DGRs evolve, and elucidate several distinct roles these elements play in natural communities.

; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Nature Communications
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Reverse transcriptases (RTs) are found in different systems including group II introns, Diversity Generating Retroelements (DGRs), retrons, CRISPR-Cas systems, and Abortive Infection (Abi) systems in prokaryotes. Different classes of RTs can play different roles, such as template switching and mobility in group II introns, spacer acquisition in CRISPR-Cas systems, mutagenic retrohoming in DGRs, programmed cell suicide in Abi systems, and recently discovered phage defense in retrons. While some classes of RTs have been studied extensively, others remain to be characterized. There is a lack of computational tools for identifying and characterizing various classes of RTs. In this study, we built a tool (called myRT) for identification and classification of prokaryotic RTs. In addition, our tool provides information about the genomic neighborhood of each RT, providing potential functional clues. We applied our tool to predict RTs in all complete and draft bacterial genomes, and created a collection that can be used for exploration of putative RTs and their associated protein domains. Application of myRT to metagenomes showed that gut metagenomes encode proportionally more RTs related to DGRs, outnumbering retron-related RTs, as compared to the collection of reference genomes. MyRT is both available as a standalone software ( and also throughmore »a website (

    « less
  2. Background

    Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enablingde novoassembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes.


    Here we evaluatede novoassembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes.


    Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCRmore »cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes.


    PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improvedde novogenome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

    « less
  3. Abstract Background

    With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth’s natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ,however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes.


    In this study, we introduce theViral Eukaryotic Bacterial Archaeal(VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge,VEBAis the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes.VEBAimplements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone.VEBAincludes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification.VEBAalso provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally,VEBAis the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genomemore »quality assessments.VEBA’s capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives.


    TheVEBAsoftware suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways.VEBAfully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions ofVEBAto the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks.VEBAallows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.

    « less
  4. Bonomo, Robert A. (Ed.)
    ABSTRACT Microbial diversity is reduced in the gut microbiota of animals and humans treated with selective serotonin reuptake inhibitors (SSRIs) and tricyclic antidepressants (TCAs). The mechanisms driving the changes in microbial composition, while largely unknown, is critical to understand considering that the gut microbiota plays important roles in drug metabolism and brain function. Using Escherichia coli , we show that the SSRI fluoxetine and the TCA amitriptyline exert strong selection pressure for enhanced efflux activity of the AcrAB-TolC pump, a member of the resistance-nodulation-cell division (RND) superfamily of transporters. Sequencing spontaneous fluoxetine- and amitriptyline-resistant mutants revealed mutations in marR and lon, negative regulators of AcrAB-TolC expression. In line with the broad specificity of AcrAB-TolC pumps these mutants conferred resistance to several classes of antibiotics. We show that the converse also occurs, as spontaneous chloramphenicol-resistant mutants displayed cross-resistance to SSRIs and TCAs. Chemical-genomic screens identified deletions in marR and lon, confirming the results observed for the spontaneous resistant mutants. In addition, deletions in 35 genes with no known role in drug resistance were identified that conferred cross-resistance to antibiotics and several displayed enhanced efflux activities. These results indicate that combinations of specific antidepressants and antibiotics may have important effects when bothmore »are used simultaneously or successively as they can impose selection for common mechanisms of resistance. Our work suggests that selection for enhanced efflux activities is an important factor to consider in understanding the microbial diversity changes associated with antidepressant treatments. IMPORTANCE Antidepressants are prescribed broadly for psychiatric conditions to alter neuronal levels of synaptic neurotransmitters such as serotonin and norepinephrine. Two categories of antidepressants are selective serotonin reuptake inhibitors (SSRIs) and tricyclic antidepressants (TCAs); both are among the most prescribed drugs in the United States. While it is well-established that antidepressants inhibit reuptake of neurotransmitters there is evidence that they also impact microbial diversity in the gastrointestinal tract. However, the mechanisms and therefore biological and clinical effects remain obscure. We demonstrate antidepressants may influence microbial diversity through strong selection for mutant bacteria with increased AcrAB-TolC activity, an efflux pump that removes antibiotics from cells. Furthermore, we identify a new group of genes that contribute to cross-resistance between antidepressants and antibiotics, several act by regulating efflux activity, underscoring overlapping mechanisms. Overall, this work provides new insights into bacterial responses to antidepressants important for understanding antidepressant treatment effects.« less
  5. Abstract Motivation

    Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses.


    We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerablymore »outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities.

    Availability and implementation

    Web-server/standalone program, source code and datasets are freely available at and

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less