skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 16, 2025

Title: Deciphering transcript architectural complexity in bacteria and archaea
ABSTRACT RNA transcripts are potential therapeutic targets, yet bacterial transcripts have uncharacterized biodiversity. We developed an algorithm for transcript prediction called tp.py using it to predict transcripts (mRNA and other RNAs) inEscherichia coliK12 and E2348/69 strains (Bacteria:gamma-Proteobacteria),Listeria monocytogenesstrains Scott A and RO15 (Bacteria:Firmicute),Pseudomonas aeruginosastrains SG17M and NN2 strains (Bacteria:gamma-Proteobacteria), andHaloferax volcanii(Archaea:Halobacteria). From >5 millionE. coliK12 and >3 millionE. coliE2348/69 newly generated Oxford Nanopore Technologies direct RNA sequencing reads, 2,487 K12 mRNAs and 1,844 E2348/69 mRNAs were predicted, with the K12 mRNAs containing more than half of the predictedE. coliK12 proteins. While the number of predicted transcripts varied by strain based on the amount of sequence data used, across all strains examined, the predicted average size of the mRNAs was 1.6–1.7 kbp, while the median size of the 5′- and 3′-untranslated regions (UTRs) were 30–90 bp. Given the lack of bacterial and archaeal transcript annotation, most predictions were of novel transcripts, but we also predicted many previously characterized mRNAs and ncRNAs, including post-transcriptionally generated transcripts and small RNAs associated with pathogenesis in theE. coliE2348/69LEEpathogenicity islands. We predicted small transcripts in the 100–200 bp range as well as >10 kbp transcripts for all strains, with the longest transcript for two of the seven strains being thenuooperon transcript, and for another two strains it was a phage/prophage transcript. This quick, easy, and reproducible method will facilitate the presentation of transcripts, and UTR predictions alongside coding sequences and protein predictions in bacterial genome annotation as important resources for the research community.IMPORTANCEOur understanding of bacterial and archaeal genes and genomes is largely focused on proteins since there have only been limited efforts to describe bacterial/archaeal RNA diversity. This contrasts with studies on the human genome, where transcripts were sequenced prior to the release of the human genome over two decades ago. We developed software for the quick, easy, and reproducible prediction of bacterial and archaeal transcripts from Oxford Nanopore Technologies direct RNA sequencing data. These predictions are urgently needed for more accurate studies examining bacterial/archaeal gene regulation, including regulation of virulence factors, and for the development of novel RNA-based therapeutics and diagnostics to combat bacterial pathogens, like those with extreme antimicrobial resistance.  more » « less
Award ID(s):
2025384
PAR ID:
10569467
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Editor(s):
Parkhill, Julian
Publisher / Repository:
ASM
Date Published:
Journal Name:
mBio
Volume:
15
Issue:
10
ISSN:
2150-7511
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Bacterial mRNAs have short life cycles, in which transcription is rapidly followed by translation and degradation within seconds to minutes. The resulting diversity of mRNA molecules across different life-cycle stages impacts their functionality but has remained unresolved. Here we quantitatively map the 3’ status of cellular RNAs in Escherichia coli during steady-state growth and report a large fraction of molecules (median>60%) that are fragments of canonical full-length mRNAs. The majority of RNA fragments are decay intermediates, whereas nascent RNAs contribute to a smaller fraction. Despite the prevalence of decay intermediates in total cellular RNA, these intermediates are underrepresented in the pool of ribosome-associated transcripts and can thus distort quantifications and differential expression analyses for the abundance of full-length, functional mRNAs. The large heterogeneity within mRNA molecules in vivo highlights the importance in discerning functional transcripts and provides a lens for studying the dynamic life cycle of mRNAs. 
    more » « less
  2. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. We evaluated this idea using real data sets (Escherichia colidata and the human genome NA12878 sequenced by Simpsonet al.) and demonstrated the ability of Transformers to detect methylation on ionic signal data. BackgroundOxford Nanopore long‐read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short‐read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions). MethodIn the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. ResultsCompared to traditional deep‐learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self‐attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation‐specific signals within a specific sequence context. ConclusionWe demonstrated the ability of Transformers to detect methylation on ionic signal data. 
    more » « less
  3. null (Ed.)
    Next-generation sequencing (NGS) technologies - Illumina RNA-seq, Pacific Biosciences isoform sequencing (PacBio Iso-seq), and Oxford Nanopore direct RNA sequencing (DRS) - have revealed the complexity of plant transcriptomes and their regulation at the co-/post-transcriptional level. Global analysis of mature mRNAs, transcripts from nuclear run-on assays, and nascent chromatin-bound mRNAs using short as well as full-length and single-molecule DRS reads have uncovered potential roles of different forms of RNA polymerase II during the transcription process, and the extent of co-transcriptional pre-mRNA splicing and polyadenylation. These tools have also allowed mapping of transcriptome-wide start sites in cap-containing RNAs, poly(A) site choice, poly(A) tail length, and RNA base modifications. The emerging theme from recent studies is that reprogramming of gene expression in response to developmental cues and stresses at the co-/post-transcriptional level likely plays a crucial role in eliciting appropriate responses for optimal growth and plant survival under adverse conditions. Although the mechanisms by which developmental cues and different stresses regulate co-/post-transcriptional splicing are largely unknown, a few recent studies indicate that the external cues target spliceosomal and splicing regulatory proteins to modulate alternative splicing. In this review, we provide an overview of recent discoveries on the dynamics and complexities of plant transcriptomes, mechanistic insights into splicing regulation, and discuss critical gaps in co-/post-transcriptional research that need to be addressed using diverse genomic and biochemical approaches. 
    more » « less
  4. Many molluscan genomes have been published to date, however only three are from representatives of the subphylum Aculifera (Polyplacophora, Caudofoveata, and Solenogastres), the sister taxon to all other molluscs. Currently, genomic resources are completely lacking for Solenogastres. This gap in knowledge hinders comparative and evolutionary studies. Here, we sequenced the genomes of the solenogaster aplacophoransEpimenia babaiSalvini-Plawen, 1997 andNeomenia megatrapezataSalvini-Plawen & Paar-Gausch, 2004 using a hybrid approach combining Oxford Nanopore and Illumina reads. ForE. babai, we produced a 628 Mbp haploid assembly (N50 = 413 Kbp, L50 = 370) that is rather complete with a BUSCO completeness score of 90.1% (82.0% single, 8.1% duplicated, 6.0% fragmented, and 3.9% missing). ForN. megatrapezata, we produced a 412 Mbp haploid assembly (N50 = 132 Kbp, L50 = 881) that is also rather complete with a BUSCO completeness score of 85.1% (81.7% single, 3.4% duplicated, 8.1% fragmented, and 6.8% missing). Our annotation pipeline predicted 25,393 gene models forE. babaiwith a BUSCO score of 92.4% (80.5% single, 11.9% duplicated, 4.9% fragmented, and 2.7% missing) and 22,463 gene models forN. megatrapezatawith a BUSCO score of 90.2% (81.0% single, 9.2% duplicated, 4.7% fragmented, and 5.1% missing). Phylogenomic analysis recovered Solenogastres as the sister taxon to Polyplacophora and Aculifera as the sister taxon to all other sampled molluscs with maximal support. These represent the first whole-genome resources for Solenogastres and will be valuable for future studies investigating this understudied group and molluscan evolution as a whole. 
    more » « less
  5. IntroductionThe rise in extended-spectrum beta-lactamase (ESBL)-producingEnterobacteriaceaein dairy cattle farms poses a risk to human health as they can spread to humans through the food chain, including raw milk. This study was designed to determine the status, antimicrobial resistance, and pathogenic potential of ESBL-producing -E. coliand -Klebsiellaspp. isolates from bulk tank milk (BTM). MethodsThirty-three BTM samples were collected from 17 dairy farms and screened for ESBL-E. coliand -Klebsiellaspp. on CHROMagar ESBL plates. All isolates were confirmed by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) and subjected to antimicrobial susceptibility testing and whole genome sequencing (WGS). ResultsTen presumptive ESBL-producing bacteria, eightE. coli, and twoK. pneumoniaewere isolated. The prevalence of ESBL-E. coliand -K. pneumoniaein BTM was 21.2% and 6.1%, respectively. ESBL-E. coliwere detected in 41.2% of the study farms. Seven of the ESBL-E. coliisolates were multidrug resistant (MDR). The two ESBL-producingK. pneumoniaeisolates were resistant to ceftriaxone. Seven ESBL-E. colistrains carry theblaCTX-Mgene, and five of them co-harboredblaTEM-1. ESBL-E. colico-harboredblaCTX-Mwith other resistance genes, includingqnrB19,tet(A),aadA1,aph(3’’)-Ib,aph(6)-Id),floR,sul2, and chromosomal mutations (gyrA, gyrB, parC, parE, and pmrB). MostE. coliresistance genes were associated with mobile genetic elements, mainly plasmids. Six sequence types (STs) ofE. coliwere detected. All ESBL-E. coliwere predicted to be pathogenic to humans. Four STs (three ST10 and ST69) were high-risk clones ofE. coli. Up to 40 virulence markers were detected in allE. coliisolates. One of theK. pneumoniaewas ST867; the other was novel strain.K. pneumoniaeisolates carried three types of beta-lactamase genes (blaCTX-M,blaTEM-1andblaSHV). The novelK. pneumoniaeST also carried a novel IncFII(K) plasmid ST. ConclusionDetection of high-risk clones of MDR ESBL-E. coliand ESBL-K. pneumoniaein BTM indicates that raw milk could be a reservoir of potentially zoonotic ESBL-E. coliand -K. pneumoniae. 
    more » « less