Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method fork-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies’ state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source atgithub.com/skovaka/uncalled4.
more »
« less
Sequoia: an interactive visual analytics platform for interpretation and feature extraction from nanopore sequencing datasets
Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia .
more »
« less
- PAR ID:
- 10311597
- Date Published:
- Journal Name:
- BMC Genomics
- Volume:
- 22
- Issue:
- 1
- ISSN:
- 1471-2164
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Direct RNA nanopore sequencing allows for the identification of full-length RNAs with a ∼10% error rate consisting of mismatches and small deletions. These errors are thought to be randomly distributed and structure-independent since RNA/cDNA duplexes are generated to prevent RNA structure formation prior to sequencing. When analyzing citrus yellow vein associated virus (CY1) reads during infection ofNicotiana benthamiana,viral (+/-)foldback RNAs (i.e., viral plus [+]-strands joined to [-]-strands) showed significantly higher error rates (mismatches and deletions) in the 5ʹ (+)RNA portion with errors that were relatively evenly distributed, while errors in the attached (-)RNA portion were less frequent and unevenly distributed. Non-foldback CY1 (+)RNAs from infected plants also showed an uneven distribution of errors, which correlated with errors inin vitrotranscribed CY1 (+)RNA reads in both position and frequency. Hotspot errors in non-foldback CY1 (+)RNA and (-)RNA reads only weakly correlated, and hotspots were frequently located 5ʹ of known structural elements. Since nanopore sequencing is also used to identify RNA modifications, which depend on base-specific sequencing errors, algorithms for RNA modification detection were also examined for bias. We found that multiple programs predicted RNA modifications inin vitrotranscribed CY1 RNA at the same positions and with similar confidence levels as within plantaCY1 RNA. These data suggest that direct RNA sequencing contains inherent error biases that may be associated with post-translocation RNA folding and low sequence complexity, and therefore extrapolations based on sequencing error require special consideration.more » « less
-
Roux, Simon (Ed.)ABSTRACT Oxford Nanopore Technologies provides multiplexing options for DNA and cDNA sequencing, but not for direct RNA sequencing. Here we describe a duplexing approach and validate it by simultaneously sequencing theSaccharomyces cerevisiaerRNA from wild type and knockout that have differential rRNA modifications, successfully demultiplexing the data using bioinformatics approaches.more » « less
-
Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m 5 C at GCU motifsRNA modifications, such as methylation, can be detected with Oxford Nanopore Technologies direct RNA sequencing. One commonly used tool for detecting 5-methylcytosine (m5C) modifications is Tombo, which uses an “Alternative Model” to detect putative modifications from a single sample. We examined direct RNA sequencing data from diverse taxa including viruses, bacteria, fungi, and animals. The algorithm consistently identified a m5C at the central position of a GCU motif. However, it also identified a m5C in the same motif in fully unmodified in vitro transcribed RNA, suggesting that this is a frequent false prediction. In the absence of further validation, several published predictions of m5C in a GCU context should be reconsidered, including those from human coronavirus and human cerebral organoid samples.more » « less
-
Abstract Motivation Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. Results In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). Availability and implementation Sigmap code is accessible at https://github.com/haowenz/sigmap. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
An official website of the United States government

