skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ghost in the Machine: Evidence for Non-Random Errors During Direct RNA Nanopore Sequencing Due to Post-Translocated RNA Folding
ABSTRACT Direct RNA nanopore sequencing allows for the identification of full-length RNAs with a ∼10% error rate consisting of mismatches and small deletions. These errors are thought to be randomly distributed and structure-independent since RNA/cDNA duplexes are generated to prevent RNA structure formation prior to sequencing. When analyzing citrus yellow vein associated virus (CY1) reads during infection ofNicotiana benthamiana,viral (+/-)foldback RNAs (i.e., viral plus [+]-strands joined to [-]-strands) showed significantly higher error rates (mismatches and deletions) in the 5ʹ (+)RNA portion with errors that were relatively evenly distributed, while errors in the attached (-)RNA portion were less frequent and unevenly distributed. Non-foldback CY1 (+)RNAs from infected plants also showed an uneven distribution of errors, which correlated with errors inin vitrotranscribed CY1 (+)RNA reads in both position and frequency. Hotspot errors in non-foldback CY1 (+)RNA and (-)RNA reads only weakly correlated, and hotspots were frequently located 5ʹ of known structural elements. Since nanopore sequencing is also used to identify RNA modifications, which depend on base-specific sequencing errors, algorithms for RNA modification detection were also examined for bias. We found that multiple programs predicted RNA modifications inin vitrotranscribed CY1 RNA at the same positions and with similar confidence levels as within plantaCY1 RNA. These data suggest that direct RNA sequencing contains inherent error biases that may be associated with post-translocation RNA folding and low sequence complexity, and therefore extrapolations based on sequencing error require special consideration.  more » « less
Award ID(s):
2330663
PAR ID:
10661285
Author(s) / Creator(s):
; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. RNA modifications, such as methylation, can be detected with Oxford Nanopore Technologies direct RNA sequencing. One commonly used tool for detecting 5-methylcytosine (m5C) modifications is Tombo, which uses an “Alternative Model” to detect putative modifications from a single sample. We examined direct RNA sequencing data from diverse taxa including viruses, bacteria, fungi, and animals. The algorithm consistently identified a m5C at the central position of a GCU motif. However, it also identified a m5C in the same motif in fully unmodified in vitro transcribed RNA, suggesting that this is a frequent false prediction. In the absence of further validation, several published predictions of m5C in a GCU context should be reconsidered, including those from human coronavirus and human cerebral organoid samples. 
    more » « less
  2. Roux, Simon (Ed.)
    ABSTRACT Oxford Nanopore Technologies provides multiplexing options for DNA and cDNA sequencing, but not for direct RNA sequencing. Here we describe a duplexing approach and validate it by simultaneously sequencing theSaccharomyces cerevisiaerRNA from wild type and knockout that have differential rRNA modifications, successfully demultiplexing the data using bioinformatics approaches. 
    more » « less
  3. Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia . 
    more » « less
  4. Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method fork-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies’ state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source atgithub.com/skovaka/uncalled4. 
    more » « less
  5. Abstract BackgroundInsects are an important reservoir of viral biodiversity, but the vast majority of viruses associated with insects have not been discovered. Recent studies have employed high-throughput RNA sequencing, which has led to rapid advances in our understanding of insect viral diversity. However, insect genomes frequently contain transcribed endogenous viral elements (EVEs) with significant homology to exogenous viruses, complicating the use of RNAseq for viral discovery. MethodsIn this study, we used a multi-pronged sequencing approach to study the virome of an important agricultural pest and prolific vector of plant pathogens, the potato aphidMacrosiphum euphorbiae. We first used rRNA-depleted RNAseq to characterize the microbes found in individual insects. We then used PCR screening to measure the frequency of two heritable viruses in a local aphid population. Lastly, we generated a quality draft genome assembly forM. euphorbiaeusing Illumina-corrected Nanopore sequencing to identify transcriptionally active EVEs in the host genome. ResultsWe found reads from two insect-specific viruses (aFlavivirusand anAmbidensovirus) in our RNAseq data, as well as a parasitoid virus (Bracovirus), a plant pathogenic virus (Tombusvirus), and two phages (Acinetobacter and APSE). However, our genome assembly showed that part of the ‘virome’ of this insect can be attributed to EVEs in the host genome. ConclusionOur work shows that EVEs have led to the misidentification of aphid viruses from RNAseq data, and we argue that this is a widespread challenge for the study of viral diversity in insects. 
    more » « less