skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An ANI gap within bacterial species that advances the definitions of intra-species units
ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings.  more » « less
Award ID(s):
2129823 1759831
PAR ID:
10526684
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Editor(s):
Jouline, Igor B
Publisher / Repository:
mBio journal, published by the American Society for Microbiology
Date Published:
Journal Name:
mBio
Volume:
15
Issue:
1
ISSN:
2150-7511
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cooper, Vaughn S (Ed.)
    ABSTRACT Despite the importance of intra-species variants of viruses for causing disease and/or disrupting ecosystem functioning, there is no universally applicable standard to define these. A (natural) gap in whole-genome average nucleotide identity (ANI) values around 95% is commonly used to define species, especially for bacteriophages, but whether a similar gap exists within species that can be used to define intra-species units has not been evaluated yet. Whole-genome comparisons among members of 1,016 bacteriophage (Caudoviricetes) species revealed a region of low frequency of ANI values around 99.2%–99.8%, showing threefold or fewer pairs than expected for an even distribution. This second gap is prevalent in viruses infecting various cultured or uncultured hosts from a variety of environments, although a few exceptions to this pattern were also observed (3.7% of total species) and are likely attributed to cultivation biases or other factors. Similar results were observed for a limited set of eukaryotic viruses that are adequately sampled, including SARS-CoV-2, whose ANI-based clusters matched well with the WHO-defined variants of concern, indicating that our findings from bacteriophages might be more broadly applicable and the ANI-based clusters may represent functionally and/or ecologically distinct units. These units appear to be predominantly driven by (high) ecological cohesiveness coupled to either frequent recombination for bacteriophages or selection and clonal evolution for other viruses such as SARS-CoV-2, indicating that fundamentally different underlying mechanisms could lead to similar diversity patterns. Accordingly, we propose the ANI gap approach outlined above for defining viral intra-species units, for which we propose the term genomovars. IMPORTANCEViral species are composed of an ensemble of intra-species variants whose individual dynamics may have major implications for human and animal health and/or ecosystem functioning. However, the lack of universally accepted standards to define these intra-species variants has led researchers to use different approaches for this task, creating inconsistent intra-species units across different viral families and confusion in communication. By comparing hundreds of mostly bacteriophage genomes, we show that there is a widely distributed natural gap in whole-genome average nucleotide identity values in most, but not all, of these species that can be used to define intra-species units. Therefore, these results advance the molecular toolbox for tracking viral intra-species units and should facilitate future epidemiological and environmental studies. 
    more » « less
  2. Abstract What a strain is and how many strains make up a natural bacterial population remain elusive concepts despite their apparent importance for assessing the role of intra-population diversity in disease emergence or response to environmental perturbations. To advance these concepts, we sequenced 138 randomly selectedSalinibacter ruberisolates from two solar salterns and assessed these genomes against companion short-read metagenomes from the same samples. The distribution of genome-aggregate average nucleotide identity (ANI) values among these isolates revealed a bimodal distribution, with four-fold lower occurrence of values between 99.2% and 99.8% relative to ANI >99.8% or <99.2%, revealing a natural “gap” in the sequence space within species. Accordingly, we used this ANI gap to define genomovars and a higher ANI value of >99.99% and shared gene-content >99.0% to define strains. Using these thresholds and extrapolating from how many metagenomic reads each genomovar uniquely recruited, we estimated that –although our 138 isolates represented about 80% of theSal. ruberpopulation– the total population in one saltern pond is composed of 5,500 to 11,000 genomovars, the great majority of which appear to be rare in-situ. These data also revealed that the most frequently recovered isolate in lab media was often not the most abundant genomovar in-situ, suggesting that cultivation biases are significant, even in cases that cultivation procedures are thought to be robust. The methodology and ANI thresholds outlined here should represent a useful guide for future microdiversity surveys of additional microbial species. 
    more » « less
  3. Ho, Simon (Ed.)
    Abstract Whole-genome comparisons based on average nucleotide identities (ANI) and the genome-to-genome distance calculator have risen to prominence in rapidly classifying prokaryotic taxa using whole-genome sequences. Some implementations have even been proposed as a new standard in species classification and have become a common technique for papers describing newly sequenced genomes. However, attempts to apply whole-genome divergence data to the delineation of higher taxonomic units and to phylogenetic inference have had difficulty matching those produced by more complex phylogenetic methods. We present a novel method for generating statistically supported phylogenies of archaeal and bacterial groups using a combined ANI and alignment fraction-based metric. For the test cases to which we applied the developed approach, we obtained results comparable with other methodologies up to at least the family level. The developed method uses nonparametric bootstrapping to gauge support for inferred groups. This method offers the opportunity to make use of whole-genome comparison data, that is already being generated, to quickly produce phylogenies including support for inferred groups. Additionally, the developed ANI methodology can assist the classification of higher taxonomic groups.[Average nucleotide identity (ANI); genome evolution; prokaryotic species delineation; taxonomy.] 
    more » « less
  4. Abstract Whether prokaryotes, and other microorganisms, form distinct clusters that can be recognized as species remains an issue of paramount theoretical as well as practical consequence in identifying, regulating, and communicating about these organisms. In the past decade, comparisons of thousands of genomes of isolates and hundreds of metagenomes have shown that prokaryotic diversity may be predominantly organized in such sequence‐discrete clusters, albeit organisms of intermediate relatedness between the identified clusters are also frequently found. Accumulating evidence suggests, however, that the latter “intermediate” organisms show enough ecological and/or functional distinctiveness to be considered different species. Notably, the area of discontinuity between clusters often—but not always—appears to be around 85%–95% genome‐average nucleotide identity, consistently among different taxa. More recent studies have revealed remarkably similar diversity patterns for viruses and microbial eukaryotes as well. This high consistency across taxa implies a specific mechanistic process that underlies the maintenance of the clusters. The underlying mechanism may be a substantial reduction in the efficiency of homologous recombination, which mediates (successful) horizontal gene transfer, around 95% nucleotide identity. Deviations from the 95% threshold (e.g., species showing lower intraspecies diversity) may be caused by ecological differentiation that imposes barriers to otherwise frequent gene transfer. While this hypothesis that clusters are driven by ecological differentiation coupled to recombination frequency (i.e., higher recombination within vs. between groups) is appealing, the supporting evidence remains anecdotal. The data needed to rigorously test the hypothesis toward advancing the species concept are also outlined. 
    more » « less
  5. null (Ed.)
    Abstract Background Antibiotic-producing Streptomyces bacteria are ubiquitous in nature, yet most studies of its diversity have focused on free-living strains inhabiting diverse soil environments and those in symbiotic relationship with invertebrates. Results We studied the draft genomes of 73 Streptomyces isolates sampled from the skin (wing and tail membranes) and fur surfaces of bats collected in Arizona and New Mexico. We uncovered large genomic variation and biosynthetic potential, even among closely related strains. The isolates, which were initially identified as three distinct species based on sequence variation in the 16S rRNA locus, could be distinguished as 41 different species based on genome-wide average nucleotide identity. Of the 32 biosynthetic gene cluster (BGC) classes detected, non-ribosomal peptide synthetases, siderophores, and terpenes were present in all genomes. On average, Streptomyces genomes carried 14 distinct classes of BGCs (range = 9–20). Results also revealed large inter- and intra-species variation in gene content (single nucleotide polymorphisms, accessory genes and singletons) and BGCs, further contributing to the overall genetic diversity present in bat-associated Streptomyces . Finally, we show that genome-wide recombination has partly contributed to the large genomic variation among strains of the same species. Conclusions Our study provides an initial genomic assessment of bat-associated Streptomyces that will be critical to prioritizing those strains with the greatest ability to produce novel antibiotics. It also highlights the need to recognize within-species variation as an important factor in genetic manipulation studies, diversity estimates and drug discovery efforts in Streptomyces . 
    more » « less