skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: RODAN: a fully convolutional architecture for basecalling nanopore RNA sequencing data
Abstract Background Despite recent progress in basecalling of Oxford nanopore DNA sequencing data, its wide adoption is still being hampered by its relatively low accuracy compared to short read technologies. Furthermore, very little of the recent research was focused on basecalling of RNA data, which has different characteristics than its DNA counterpart. Results We fill this gap by benchmarking a fully convolutional deep learning basecalling architecture with improved performance compared to Oxford nanopore’s RNA basecallers. Availability The source code for our basecaller is available at: https://github.com/biodlab/RODAN .  more » « less
Award ID(s):
1450032
PAR ID:
10347171
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
BMC Bioinformatics
Volume:
23
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. We evaluated this idea using real data sets (Escherichia colidata and the human genome NA12878 sequenced by Simpsonet al.) and demonstrated the ability of Transformers to detect methylation on ionic signal data. BackgroundOxford Nanopore long‐read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short‐read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions). MethodIn the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. ResultsCompared to traditional deep‐learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self‐attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation‐specific signals within a specific sequence context. ConclusionWe demonstrated the ability of Transformers to detect methylation on ionic signal data. 
    more » « less
  2. Flap endonuclease 1 (FEN1) is an essential enzyme that removes RNA primers and base lesions during DNA lagging strand maturation and long-patch base excision repair (BER). It plays a crucial role in maintaining genome stability and integrity. FEN1 is also implicated in RNA processing and biogenesis. A recent study from our group has shown that FEN1 is involved in trinucleotide repeat deletion by processing the RNA strand in R-loops through BER, further suggesting that the enzyme can modulate genome stability by facilitating the resolution of R-loops. However, it remains unknown how FEN1 can process RNA to resolve an R-loop. In this study, we examined the FEN1 cleavage activity on the RNA:DNA hybrid intermediates generated during DNA lagging strand processing and BER in R-loops. We found that both human and yeast FEN1 efficiently cleaved an RNA flap in the intermediates using its endonuclease activity. We further demonstrated that FEN1 was recruited to R-loops in normal human fibroblasts and senataxin-deficient (AOA2) fibroblasts, and its R-loop recruitment was significantly increased by oxidative DNA damage. We showed that FEN1 specifically employed its endonucleolytic cleavage activity to remove the RNA strand in an R-loop during BER. We found that FEN1 coordinated its DNA and RNA endonucleolytic cleavage activity with the 3′-5′ exonuclease of APE1 to resolve the R-loop. Our results further suggest that FEN1 employed its unique tracking mechanism to endonucleolytically cleave the RNA strand in an R-loop by coordinating with other BER enzymes and cofactors during BER. Our study provides the first evidence that FEN1 endonucleolytic cleavage can result in the resolution of R-loops via the BER pathway, thereby maintaining genome integrity. 
    more » « less
  3. Genetic information is encoded in the DNA double helix, which, in its physiological milieu, is characterized by the iconical Watson-Crick nucleo-base pairing. Recent NMR relaxation experiments revealed the transient presence of an alternative, Hoogsteen (HG) base pairing pattern in naked DNA duplexes, and estimated its relative stability and lifetime. In contrast with DNA, such structures were not observed in RNA duplexes. Understanding HG base pairing is important because the underlying "breathing" motion between the two conformations can significantly modulate protein binding. However, a detailed mechanistic insight into the transition pathways and kinetics is still missing. We performed enhanced sampling simulation (with combined metadynamics and adaptive force-bias method) and Markov state modeling to obtain accurate free energy, kinetics, and the intermediates in the transition pathway between Watson-Crick and HG base pairs for both naked B-DNA and A-RNA duplexes. The Markov state model constructed from our unbiased MD simulation data revealed previously unknown complex extrahelical intermediates in the seemingly simple process of base flipping in B-DNA. Extending our calculation to A-RNA, for which HG base pairing is not observed experimentally, resulted in relatively unstable, single-hydrogen-bonded, distorted Hoogsteen-like bases. Unlike B-DNA, the transition pathway primarily involved base paired and intrahelical intermediates with transition timescales much longer than that of B-DNA. The seemingly obvious flip-over reaction coordinate (i.e., the glycosidic torsion angle) is unable to resolve the intermediates. Instead, a multidimensional picture involving backbone dihedral angles and distance between hydrogen bond donor and acceptor atoms is required to gain insight into the molecular mechanism. 
    more » « less
  4. Estrogen receptor alpha (ERα) is a ligand-responsive transcription factor critical for sex determination and development. Recent reports challenge the canonical view of ERα function by suggesting an activity beyond binding dsDNA at estrogen-responsive promotor elements: association with RNAs in vivo. Whether these interactions are direct or indirect remains unknown, which limits the ability to understand the extent, specificity, and biological role of ERα-RNA binding. Here we demonstrate that an extended DNA-binding domain of ERα directly binds a wide range of RNAs in vitro with structural specificity. ERα binds RNAs that adopt a range of hairpin-derived structures independent of sequence, while interacting poorly with single- and double-stranded RNA. RNA affinities are only 4-fold weaker than consensus dsDNA and significantly tighter than nonconsensus dsDNA sequences. Moreover, RNA binding is competitive with DNA binding. Together, these data show that ERα utilizes an extended DNA-binding domain to achieve a high-affinity/low-specificity mode for interacting with RNA. 
    more » « less
  5. Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method fork-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies’ state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source atgithub.com/skovaka/uncalled4. 
    more » « less