skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A revisit to universal single-copy genes in bacterial genomes
Abstract Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.  more » « less
Award ID(s):
2015838
PAR ID:
10640687
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Scientific Reports
Volume:
12
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jouline, Igor B (Ed.)
    ABSTRACT Large-scale surveys of prokaryotic communities (metagenomes), as well as isolate genomes, have revealed that their diversity is predominantly organized in sequence-discrete units that may be equated to species. Specifically, genomes of the same species commonly show genome-aggregate average nucleotide identity (ANI) >95% among themselves and ANI <90% to members of other species, while genomes showing ANI 90%–95% are comparatively rare. However, it remains unclear if such “discontinuities” or gaps in ANI values can be observed within species and thus used to advance and standardize intra-species units. By analyzing 18,123 complete isolate genomes from 330 bacterial species with at least 10 genome representatives each and available long-read metagenomes, we show that another discontinuity exists between 99.2% and 99.8% (midpoint 99.5%) ANI in most of these species. The 99.5% ANI threshold is largely consistent with how sequence types have been defined in previous epidemiological studies but provides clusters with ~20% higher accuracy in terms of evolutionary and gene-content relatedness of the grouped genomes, while strains should be consequently defined at higher ANI values (>99.99% proposed). Collectively, our results should facilitate future micro-diversity studies across clinical or environmental settings because they provide a more natural definition of intra-species units of diversity. IMPORTANCEBacterial strains and clonal complexes are two cornerstone concepts for microbiology that remain loosely defined, which confuses communication and research. Here we identify a natural gap in genome sequence comparisons among isolate genomes of all well-sequenced species that has gone unnoticed so far and could be used to more accurately and precisely define these and related concepts compared to current methods. These findings advance the molecular toolbox for accurately delineating and following the important units of diversity within prokaryotic species and thus should greatly facilitate future epidemiological and micro-diversity studies across clinical and environmental settings. 
    more » « less
  2. Giraud, Tatiana (Ed.)
    Abstract The Global Panzootic Lineage (GPL) of Batrachochytrium dendrobatidis (Bd) has been described as a main driver of amphibian extinctions. Pathogen studies have benefited from three Bd-GPL strain genomes, but identifying the genetic and molecular features that distinguish the B. dendrobatidis lineages requires additional high-quality genomes from diverse lineages. We sequenced and assembled genomes with Oxford Nanopore Technologies to produce assemblies of three Bd-BRAZIL isolates and one nonpathogen outgroup species Polyrhizophydium stewartii. The Bd-BRAZIL assembly sizes ranged between 22.0 and 26.1 Mb with 8,495 to 8,620 predicted protein-coding genes. We sought to categorize the pangenome of the species by identifying homologous genes across the sampled genomes as either being core and present in all strains, or accessory and shared among strains in a lineage, an analysis that has not yet been conducted on B. dendrobatidis and its lineages. We identified a core genome consisting of 6,278 gene families, and an accessory genome of 202 Bd-BRAZIL and 172 Bd-GPL specific gene families. We discovered copy number differences in pathogenicity gene families: M36 Peptidases, Crinkler Necrosis genes, Aspartyl Peptidases, Carbohydrate-Binding Module-18 genes, and S41 Proteases, between Bd-BRAZIL and Bd-GPL strains. Comparison of B. dendrobatidis and two closely related saprophytic species identified differences in protein sequence and domain counts for M36 and CBM18 families respectively. Our pangenome analysis of lineage-specific gene content led us to explore how the selection of the reference genome affects recovery of RNAseq transcripts when comparing different strains. We tested the hypothesis that genomic variation among Bd-GPL and Bd-BRAZIL lineages can impact transcript count data by comparing results with our new Bd-BRAZIL genomes as the reference genomes. Our analysis examines the genomic variation between strains in Bd-BRAZIL and Bd-GPL and offers insights into the application of these high-quality reference genomes resources for future studies. 
    more » « less
  3. Komeili, Arash (Ed.)
    ABSTRACT Multipartite bacterial genome organization can confer advantages, including coordinated gene regulation and faster genome replication, but is challenging to maintain.Agrobacterium tumefacienslineages often contain a circular chromosome (Ch1), a linear chromosome (Ch2), and multiple plasmids. We previously observed that in some stocks of the C58 lab model, Ch1 and Ch2 were fused into a linear dicentric chromosome. Here we analyzedAgrobacteriumnatural isolates from the French Collection for Plant-Associated Bacteria and identified two strains distinct from C58 with fused chromosomes. Chromosome conformation capture identified integration junctions that were different from the C58 fusion strain. Genome-wide DNA replication profiling showed that both replication origins remained active. Transposon sequencing revealed that partitioning systems of both chromosome centromeres were essential. Importantly, the site-specific recombinase XerCD is required for the survival of the strains containing the fusion chromosome. Our findings show that replicon fusion occurs in natural environments and that balanced replication arm sizes and proper resolution systems enable the survival of such strains. IMPORTANCEMost bacterial genomes are monopartite with a single, circular chromosome. However, some species, likeAgrobacterium tumefaciens, carry multiple chromosomes. Emergence of multipartite genomes is often related to adaptation to specific niches, including pathogenesis or symbiosis. Multipartite genomes confer certain advantages; however, maintaining this complex structure can present significant challenges. We previously reported a laboratory-propagated lineage ofA. tumefaciensstrain C58 in which the circular and linear chromosomes fused to form a single dicentric chromosome. Here we discovered two geographically separated environmental isolates ofA. tumefacienscontaining fused chromosomes with integration junctions different from the C58 fusion chromosome, revealing the constraints and diversification of this process. We found that balanced replication arm sizes and the repurposing of multimer resolution systems enable the survival and stable maintenance of dicentric chromosomes. These findings reveal how multipartite genomes function across different bacterial species and the role of genomic plasticity in bacterial genetic diversification. 
    more » « less
  4. El_Allali, Achraf (Ed.)
    With mutations constantly accumulating in bacterial genomes, it is unclear whether the previously identified bacterial strains are really present in an extant sample. To address this question, we did a case study on the known strains of the bacterial speciesS.aureusandS.epidermisin 68 atopic dermatitis shotgun metagenomic samples. We evaluated the likelihood of the presence of all sixteen known strains predicted in the original study and by two popular tools in this study. We found that even with the same tool, only two known strains were predicted by the original study and this study. Moreover, none of the sixteen known strains was likely present in these 68 samples. Our study thus indicates the limitation of the known-strain-based studies, especially those on rapidly evolving bacterial species. It implies the unlikely presence of the previously identified known strains in a current environmental sample. It also called for de novo bacterial strain identification directly from shotgun metagenomic reads. 
    more » « less
  5. Parkhill, Julian (Ed.)
    ABSTRACT RNA transcripts are potential therapeutic targets, yet bacterial transcripts have uncharacterized biodiversity. We developed an algorithm for transcript prediction called tp.py using it to predict transcripts (mRNA and other RNAs) inEscherichia coliK12 and E2348/69 strains (Bacteria:gamma-Proteobacteria),Listeria monocytogenesstrains Scott A and RO15 (Bacteria:Firmicute),Pseudomonas aeruginosastrains SG17M and NN2 strains (Bacteria:gamma-Proteobacteria), andHaloferax volcanii(Archaea:Halobacteria). From >5 millionE. coliK12 and >3 millionE. coliE2348/69 newly generated Oxford Nanopore Technologies direct RNA sequencing reads, 2,487 K12 mRNAs and 1,844 E2348/69 mRNAs were predicted, with the K12 mRNAs containing more than half of the predictedE. coliK12 proteins. While the number of predicted transcripts varied by strain based on the amount of sequence data used, across all strains examined, the predicted average size of the mRNAs was 1.6–1.7 kbp, while the median size of the 5′- and 3′-untranslated regions (UTRs) were 30–90 bp. Given the lack of bacterial and archaeal transcript annotation, most predictions were of novel transcripts, but we also predicted many previously characterized mRNAs and ncRNAs, including post-transcriptionally generated transcripts and small RNAs associated with pathogenesis in theE. coliE2348/69LEEpathogenicity islands. We predicted small transcripts in the 100–200 bp range as well as >10 kbp transcripts for all strains, with the longest transcript for two of the seven strains being thenuooperon transcript, and for another two strains it was a phage/prophage transcript. This quick, easy, and reproducible method will facilitate the presentation of transcripts, and UTR predictions alongside coding sequences and protein predictions in bacterial genome annotation as important resources for the research community.IMPORTANCEOur understanding of bacterial and archaeal genes and genomes is largely focused on proteins since there have only been limited efforts to describe bacterial/archaeal RNA diversity. This contrasts with studies on the human genome, where transcripts were sequenced prior to the release of the human genome over two decades ago. We developed software for the quick, easy, and reproducible prediction of bacterial and archaeal transcripts from Oxford Nanopore Technologies direct RNA sequencing data. These predictions are urgently needed for more accurate studies examining bacterial/archaeal gene regulation, including regulation of virulence factors, and for the development of novel RNA-based therapeutics and diagnostics to combat bacterial pathogens, like those with extreme antimicrobial resistance. 
    more » « less