skip to main content

Title: Evolutionary stasis of a deep subsurface microbial lineage

Sulfate-reducing bacteriaCandidatusDesulforudis audaxviator (CDA) were originally discovered in deep fracture fluids accessed via South African gold mines and have since been found in geographically widespread deep subsurface locations. In order to constrain models for subsurface microbial evolution, we compared CDA genomes from Africa, North America and Eurasia using single cell genomics. Unexpectedly, 126 partial single amplified genomes from the three continents, a complete genome from of an isolate from Eurasia, and metagenome-assembled genomes from Africa and Eurasia shared >99.2% average nucleotide identity, low frequency of SNP’s, and near-perfectly conserved prophages and CRISPRs. Our analyses reject sample cross-contamination, recent natural dispersal, and unusually strong purifying selection as likely explanations for these unexpected results. We therefore conclude that the analyzed CDA populations underwent only minimal evolution since their physical separation, potentially as far back as the breakup of Pangea between 165 and 55 Ma ago. High-fidelity DNA replication and repair mechanisms are the most plausible explanation for the highly conserved genome of CDA. CDA presents a stark contrast to the current model organisms in microbial evolutionary studies, which often develop adaptive traits over far shorter periods of time.

; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
The ISME Journal
Page Range or eLocation-ID:
p. 2830-2842
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.


    We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, ormore »single-cell genomes. Results are presented in the form of tables for metabolism and a variety of visualizations including biogeochemical cycling potential, representation of sequential metabolic transformations, community-scale microbial functional networks using a newly defined metric “MW-score” (metabolic weight score), and metabolic Sankey diagrams. METABOLIC takes ~ 3 h with 40 CPU threads to process ~ 100 genomes and corresponding metagenomic reads within which the most compute-demanding part of hmmsearch takes ~ 45 min, while it takes ~ 5 h to complete hmmsearch for ~ 3600 genomes. Tests of accuracy, robustness, and consistency suggest METABOLIC provides better performance compared to other software and online servers. To highlight the utility and versatility of METABOLIC, we demonstrate its capabilities on diverse metagenomic datasets from the marine subsurface, terrestrial subsurface, meadow soil, deep sea, freshwater lakes, wastewater, and the human gut.


    METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at

    « less
  2. Abstract Background

    How vascular systems and their respiratory pigments evolved is still debated. While many animals present a vascular system, hemoglobin exists as a blood pigment only in a few groups (vertebrates, annelids, a few arthropod and mollusk species). Hemoglobins are formed of globin sub-units, belonging to multigene families, in various multimeric assemblages. It was so far unclear whether hemoglobin families from different bilaterian groups had a common origin.


    To unravel globin evolution in bilaterians, we studied the marine annelidPlatynereis dumerilii,a species with a slow evolving genome.Platynereisexhibits a closed vascular system filled with extracellular hemoglobin.Platynereisgenome and transcriptomes reveal a family of 19 globins, nine of which are predicted to be extracellular. Extracellular globins are produced by specialized cells lining the vessels of the segmental appendages of the worm, serving as gills, and thus likely participate in the assembly of a previously characterized annelid-specific giant hemoglobin. Extracellular globin mRNAs are absent in smaller juveniles, accumulate considerably in growing and more active worms and peak in swarming adults, as the need for O2culminates. Next, we conducted a metazoan-wide phylogenetic analysis of globins using data from complete genomes. We establish that five globin genes (stem globins) were present in the last common ancestor ofmore »bilaterians. Based on these results, we propose a new nomenclature of globins, with five clades. All five ancestral stem-globin clades are retained in some spiralians, while some clades disappeared early in deuterostome and ecdysozoan evolution.All known bilaterian blood globin families are grouped in a single clade (clade I) together with intracellular globins of bilaterians devoid of red blood.


    We uncover a complex “pre-blood” evolution of globins, with an early gene radiation in ancestral bilaterians. Circulating hemoglobins in various bilaterian groups evolved convergently, presumably in correlation with animal size and activity. However, all hemoglobins derive from a clade I globin, or cytoglobin, probably involved in intracellular O2transit and regulation. The annelidPlatynereisis remarkable in having a large family of extracellular blood globins, while retaining all clades of ancestral bilaterian globins.

    « less
  3. Abstract

    The PacBio®HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10–25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomesMus musculusandZea mays, as well as two complex genomes, octoploidFragaria × ananassaand the diploid anuranRana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

  4. Abstract

    The supergroup Amoebozoa unites a wide diversity of amoeboid organisms and encompasses enigmatic lineages that have been recalcitrant to modern phylogenetics. Deep divergences, taxonomic placement of some key taxa and character evolution in the group largely remain poorly elucidated or controversial. We surveyed available Amoebozoa genomes and transcriptomes to mine conserved putative single copy genes, which were used to enrich gene sampling and generate the largest supermatrix in the group to date; encompassing 824 genes, including gene sequences not previously analyzed. We recovered a well-resolved and supported tree of Amoebozoa, revealing novel deep level relationships and resolving placement of enigmatic lineages congruent with morphological data. In our analysis the deepest branching group is Tubulinea. A recent proposed major clade Tevosa, uniting Evosea and Tubulinea, is not supported. Based on the new phylogenetic tree, paleoecological and paleontological data as well as data on the biology of presently living amoebozoans, we hypothesize that the evolution of Amoebozoa probably was driven by adaptive responses to a changing environment, where successful survival and predation resulted from a capacity to disrupt and graze on microbial mats-a dominant ecosystem of the mid-Proterozoic period of the Earth history.

  5. Background

    Metagenomics has transformed our understanding of microbial diversity across ecosystems, with recent advances enablingde novoassembly of genomes from metagenomes. These metagenome-assembled genomes are critical to provide ecological, evolutionary, and metabolic context for all the microbes and viruses yet to be cultivated. Metagenomes can now be generated from nanogram to subnanogram amounts of DNA. However, these libraries require several rounds of PCR amplification before sequencing, and recent data suggest these typically yield smaller and more fragmented assemblies than regular metagenomes.


    Here we evaluatede novoassembly methods of 169 PCR-amplified metagenomes, including 25 for which an unamplified counterpart is available, to optimize specific assembly approaches for PCR-amplified libraries. We first evaluated coverage bias by mapping reads from PCR-amplified metagenomes onto reference contigs obtained from unamplified metagenomes of the same samples. Then, we compared different assembly pipelines in terms of assembly size (number of bp in contigs ≥ 10 kb) and error rates to evaluate which are the best suited for PCR-amplified metagenomes.


    Read mapping analyses revealed that the depth of coverage within individual genomes is significantly more uneven in PCR-amplified datasets versus unamplified metagenomes, with regions of high depth of coverage enriched in short inserts. This enrichment scales with the number of PCRmore »cycles performed, and is presumably due to preferential amplification of short inserts. Standard assembly pipelines are confounded by this type of coverage unevenness, so we evaluated other assembly options to mitigate these issues. We found that a pipeline combining read deduplication and an assembly algorithm originally designed to recover genomes from libraries generated after whole genome amplification (single-cell SPAdes) frequently improved assembly of contigs ≥10 kb by 10 to 100-fold for low input metagenomes.


    PCR-amplified metagenomes have enabled scientists to explore communities traditionally challenging to describe, including some with extremely low biomass or from which DNA is particularly difficult to extract. Here we show that a modified assembly pipeline can lead to an improvedde novogenome assembly from PCR-amplified datasets, and enables a better genome recovery from low input metagenomes.

    « less