skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The DOI auto-population feature in the Public Access Repository (PAR) will be unavailable from 4:00 PM ET on Tuesday, July 8 until 4:00 PM ET on Wednesday, July 9 due to scheduled maintenance. We apologize for the inconvenience caused.


Title: Towards estimating the number of strains that make up a natural bacterial population
Abstract What a strain is and how many strains make up a natural bacterial population remain elusive concepts despite their apparent importance for assessing the role of intra-population diversity in disease emergence or response to environmental perturbations. To advance these concepts, we sequenced 138 randomly selectedSalinibacter ruberisolates from two solar salterns and assessed these genomes against companion short-read metagenomes from the same samples. The distribution of genome-aggregate average nucleotide identity (ANI) values among these isolates revealed a bimodal distribution, with four-fold lower occurrence of values between 99.2% and 99.8% relative to ANI >99.8% or <99.2%, revealing a natural “gap” in the sequence space within species. Accordingly, we used this ANI gap to define genomovars and a higher ANI value of >99.99% and shared gene-content >99.0% to define strains. Using these thresholds and extrapolating from how many metagenomic reads each genomovar uniquely recruited, we estimated that –although our 138 isolates represented about 80% of theSal. ruberpopulation– the total population in one saltern pond is composed of 5,500 to 11,000 genomovars, the great majority of which appear to be rare in-situ. These data also revealed that the most frequently recovered isolate in lab media was often not the most abundant genomovar in-situ, suggesting that cultivation biases are significant, even in cases that cultivation procedures are thought to be robust. The methodology and ANI thresholds outlined here should represent a useful guide for future microdiversity surveys of additional microbial species.  more » « less
Award ID(s):
1831582 2129823
PAR ID:
10486120
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cooper, Vaughn S (Ed.)
    ABSTRACT Despite the importance of intra-species variants of viruses for causing disease and/or disrupting ecosystem functioning, there is no universally applicable standard to define these. A (natural) gap in whole-genome average nucleotide identity (ANI) values around 95% is commonly used to define species, especially for bacteriophages, but whether a similar gap exists within species that can be used to define intra-species units has not been evaluated yet. Whole-genome comparisons among members of 1,016 bacteriophage (Caudoviricetes) species revealed a region of low frequency of ANI values around 99.2%–99.8%, showing threefold or fewer pairs than expected for an even distribution. This second gap is prevalent in viruses infecting various cultured or uncultured hosts from a variety of environments, although a few exceptions to this pattern were also observed (3.7% of total species) and are likely attributed to cultivation biases or other factors. Similar results were observed for a limited set of eukaryotic viruses that are adequately sampled, including SARS-CoV-2, whose ANI-based clusters matched well with the WHO-defined variants of concern, indicating that our findings from bacteriophages might be more broadly applicable and the ANI-based clusters may represent functionally and/or ecologically distinct units. These units appear to be predominantly driven by (high) ecological cohesiveness coupled to either frequent recombination for bacteriophages or selection and clonal evolution for other viruses such as SARS-CoV-2, indicating that fundamentally different underlying mechanisms could lead to similar diversity patterns. Accordingly, we propose the ANI gap approach outlined above for defining viral intra-species units, for which we propose the term genomovars. IMPORTANCEViral species are composed of an ensemble of intra-species variants whose individual dynamics may have major implications for human and animal health and/or ecosystem functioning. However, the lack of universally accepted standards to define these intra-species variants has led researchers to use different approaches for this task, creating inconsistent intra-species units across different viral families and confusion in communication. By comparing hundreds of mostly bacteriophage genomes, we show that there is a widely distributed natural gap in whole-genome average nucleotide identity values in most, but not all, of these species that can be used to define intra-species units. Therefore, these results advance the molecular toolbox for tracking viral intra-species units and should facilitate future epidemiological and environmental studies. 
    more » « less
  2. Jouline, Igor B (Ed.)
    ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings. 
    more » « less
  3. Abstract Metagenomic surveys have revealed that natural microbial communities are predominantly composed of sequence-discrete, species-like populations but the genetic and/or ecological processes that maintain such populations remain speculative, limiting our understanding of population speciation and adaptation to perturbations. To address this knowledge gap, we sequenced 112 Salinibacter ruber isolates and 12 companion metagenomes from four adjacent saltern ponds in Mallorca, Spain that were experimentally manipulated to dramatically alter salinity and light intensity, the two major drivers of this ecosystem. Our analyses showed that the pangenome of the local Sal. ruber population is open and similar in size (~15,000 genes) to that of randomly sampled Escherichia coli genomes. While most of the accessory (noncore) genes were isolate-specific and showed low in situ abundances based on the metagenomes compared to the core genes, indicating that they were functionally unimportant and/or transient, 3.5% of them became abundant when salinity (but not light) conditions changed and encoded for functions related to osmoregulation. Nonetheless, the ecological advantage of these genes, while significant, was apparently not strong enough to purge diversity within the population. Collectively, our results provide an explanation for how this immense intrapopulation gene diversity is maintained, which has implications for the prokaryotic species concept. 
    more » « less
  4. Three novel carbon monoxide-oxidizing Halobacteria were isolated from Bonneville Salt Flats (Utah, USA) salt crusts and nearby saline soils. Phylogenetic analysis of 16S rRNA gene sequences revealed that strains PCN9 T , WSA2 T and WSH3 T belong to the genera Halobacterium , Halobaculum and Halovenus , respectively. Strains PCN9 T , WSA2 T and WSH3 T grew optimally at 40 °C (PCN9 T ) or 50 °C (WSA2 T , WSH3 T ). NaCl optima were 3 M (PCN9 T , WSA2 T ) or 4 M NaCl (WSH3 T ). Carbon monoxide was oxidized by all isolates, each of which contained a molybdenum-dependent CO dehydrogenase. G+C contents for the three respective isolates were 66.75, 67.62, and 63.97 mol% as derived from genome analyses. The closest phylogenetic relatives for PCN9 T , WSA2 T and WSH3 T were Halobacterium noricense A1 T , Halobaculum roseum D90 T and Halovenus aranensis EB27 T with 98.71, 98.19 and 95.95 % 16S rRNA gene sequence similarities, respectively. Genome comparisons of PCN9 T with Halobacterium noricense A1 T yielded an average nucleotide identity (ANI) of 82.0% and a digital DNA–DNA hybridization (dDDH) value of 25.7 %; comparisons of WSA2 T with Halobaculum roseum D90 T yielded ANI and dDDH values of 86.34 and 31.1 %, respectively. The ANI value for a comparison of WSH3 T with Halovenus aranensis EB27 T was 75.2 %. Physiological, biochemical, genetic and genomic characteristics of PCN9 T , WSA2 T and WSH3 T differentiated them from their closest phylogenetic neighbours and indicated that they represent novel species for which the names Halobaculum bonnevillei , Halobaculum saliterrae and Halovenus carboxidivorans are proposed, respectively. The type strains are PCN9 T (=JCM 32472=LMG 31022=ATCC TSD-126), WSA2 T (=JCM 32473=ATCC TSD-127) and WSH3 T (=JCM 32474=ATCC TSD-128). 
    more » « less
  5. Abstract The relative importance of separation by distance and by environment to population genetic diversity can be conveniently tested in river networks, where these two drivers are often independently distributed over space. To evaluate the importance of dispersal and environmental conditions in shaping microbial population structures, we performed genome‐resolved metagenomic analyses of benthicMicrocoleus‐dominated cyanobacterial mats collected in the Eel and Russian River networks (California, USA). The 64 Microcoleusgenomes were clustered into three species that shared >96.5% average nucleotide identity (ANI). Most mats were dominated by one strain, but minor alleles within mats were often shared, even over large spatial distances (>300 km). Within the most commonMicrocoleusspecies, the ANI between the dominant strains within mats decreased with increasing spatial separation. However, over shorter spatial distances (tens of kilometres), mats from different subwatersheds had lower ANI than mats from the same subwatershed, suggesting that at shorter spatial distances environmental differences between subwatersheds in factors like canopy cover, conductivity, and mean annual temperature decreases ANI. Since mats in smaller creeks had similar levels of nucleotide diversity (π) as mats in larger downstream subwatersheds, within‐mat genetic diversity does not appear to depend on the downstream accumulation of upstream‐derived strains. The four‐gamete test and sequence length bias suggest recombination occurs between almost all strains within each species, even between populations separated by large distances or living in different habitats. Overall, our results show that, despite some isolation by distance and environmental conditions, sufficient gene‐flow occurs among cyanobacterial strains to prevent either driver from producing distinctive population structures across the watershed. 
    more » « less