Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity.
Bacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings.