skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving Phylogenies Based on Average Nucleotide Identity, Incorporating Saturation Correction and Nonparametric Bootstrap Support
Abstract Whole-genome comparisons based on average nucleotide identities (ANI) and the genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole-genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole-genome divergence data to the delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach, we obtained results comparable with other methodologies up to at least the family level. The developed method uses nonparametric bootstrapping to gauge support for inferred groups. This method offers the opportunity to make use of whole-genome comparison data, that is already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist the classification of higher taxonomic groups.[Average nucleotide identity (ANI); genome evolution; prokaryotic species delineation; taxonomy.]  more » « less
Award ID(s):
1716046
PAR ID:
10323784
Author(s) / Creator(s):
; ; ;
Editor(s):
Ho, Simon
Date Published:
Journal Name:
Systematic Biology
Volume:
71
Issue:
2
ISSN:
1063-5157
Page Range / eLocation ID:
396 to 409
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Surveys of microbial communities (metagenomics) or isolate genomes have revealed sequence-discrete species. That is, members of the same species show >95% average nucleotide identity (ANI) of shared genes among themselves vs. <83% ANI to members of other species while genome pairs showing between 83% and 95% ANI are comparatively rare. In these surveys, aquatic bacteria of the ubiquitous SAR11 clade (Class Alphaproteobacteria) are an outlier and often do not exhibit discrete species boundaries, suggesting the potential for alternate modes of genetic differentiation. To explore evolution in SAR11, we analyzed high-quality, single-cell amplified genomes, and companion metagenomes from an oxygen minimum zone in the Eastern Tropical Pacific Ocean, where the SAR11 make up ~20% of the total microbial community. Our results show that SAR11 do form several sequence-discrete species, but their ANI range of discreteness is shifted to lower identities between 86% and 91%, with intra-species ANI ranging between 91% and 100%. Measuring recent gene exchange among these genomes based on a recently developed methodology revealed higher frequency of homologous recombination within compared to between species that affects sequence evolution at least twice as much as diversifying point mutation across the genome. Recombination in SAR11 appears to be more promiscuous compared to other prokaryotic species, likely due to the deletion of universal genes involved in the mismatch repair, and has facilitated the spread of adaptive mutations within the species (gene sweeps), further promoting the high intraspecies diversity observed. Collectively, these results implicate rampant, genome-wide homologous recombination as the mechanism of cohesion for distinct SAR11 species. 
    more » « less
  2. Jouline, Igor B (Ed.)
    ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings. 
    more » « less
  3. Genomics has put prokaryotic rank-based taxonomy on a solid phylogenetic foundation. However, most taxonomic ranks were set long before the advent of DNA sequencing and genomics. In this concept paper, we thus ask the following question: should prokaryotic classification schemes besides the current phylum-to-species ranks be explored, developed, and incorporated into scientific discourse? Could such alternative schemes provide better solutions to the basic need of science and society for which taxonomy was developed, namely, precise and meaningful identification? A neutral genome-similarity based framework is then described that could allow alternative classification schemes to be explored, compared, and translated into each other without having to choose only one as the gold standard. Classification schemes could thus continue to evolve and be selected according to their benefits and based on how well they fulfill the need for prokaryotic identification. 
    more » « less
  4. The history of lamprey evolution has been contentious due to limited morphological differentiation and limited genetic data. Available data has produced inconsistent results, including in the relationship among northern and southern species and the monophyly of putative clades. Here we use whole genome sequence data sourced from a public database to identify orthologs for 11 lamprey species from across the globe and build phylogenies. The phylogeny showed a clear separation between northern and southern lamprey species, which contrasts with some prior work. We also find that the phylogenetic relationships of our samples of two genera, Lethenteron and Eudontomyzon, deviate from the taxonomic classification of these species, suggesting that they require reclassification. 
    more » « less
  5. Abstract The typical owl family (Strigidae) comprises 194 species in 28 genera, 14 of which are monotypic. Relationships within and among genera in the typical owls have been challenging to discern because mitochondrial data have produced equivocal results and because many monotypic genera have been omitted from previous molecular analyses. Here, we collected and analyzed DNA sequences of ultraconserved elements (UCEs) from 43 species of typical owls to produce concatenated and multispecies coalescent-based phylogenetic hypotheses for all but one genus in the typical owl family. Our results reveal extensive paraphyly of taxonomic groups across phylogenies inferred using different analytical approaches and suggest the genera Athene, Otus, Asio, Megascops, Bubo, and Strix are paraphyletic, whereas Ninox and Glaucidium are polyphyletic. Secondary analyses of protein-coding mitochondrial genes harvested from off-target sequencing reads and mitochondrial genomes downloaded from GenBank generally support the extent of paraphyly we observe, although some disagreements exist at higher taxonomic levels between our nuclear and mitochondrial phylogenetic hypotheses. Overall, our results demonstrate the importance of taxon sampling for understanding and describing evolutionary relationships in this group, as well as the need for additional sampling, study, and taxonomic revision of typical owl species. Additionally, our findings highlight how both divergence and convergence in morphological characters have obscured our understanding of the evolutionary history of typical owls, particularly those with insular distributions. 
    more » « less