Ghost in the Machine: Evidence for Non-Random Errors During Direct RNA Nanopore Sequencing Due to Post-Translocated RNA Folding

Needham, Jason M (ORCID:0000000237578127); Johnson, Philip Z; Simon, Anne E

doi:10.64898/2025.12.02.691860

ABSTRACT Direct RNA nanopore sequencing allows for the identification of full-length RNAs with a ∼10% error rate consisting of mismatches and small deletions. These errors are thought to be randomly distributed and structure-independent since RNA/cDNA duplexes are generated to prevent RNA structure formation prior to sequencing. When analyzing citrus yellow vein associated virus (CY1) reads during infection ofNicotiana benthamiana,viral (+/-)foldback RNAs (i.e., viral plus [+]-strands joined to [-]-strands) showed significantly higher error rates (mismatches and deletions) in the 5ʹ (+)RNA portion with errors that were relatively evenly distributed, while errors in the attached (-)RNA portion were less frequent and unevenly distributed. Non-foldback CY1 (+)RNAs from infected plants also showed an uneven distribution of errors, which correlated with errors inin vitrotranscribed CY1 (+)RNA reads in both position and frequency. Hotspot errors in non-foldback CY1 (+)RNA and (-)RNA reads only weakly correlated, and hotspots were frequently located 5ʹ of known structural elements. Since nanopore sequencing is also used to identify RNA modifications, which depend on base-specific sequencing errors, algorithms for RNA modification detection were also examined for bias. We found that multiple programs predicted RNA modifications inin vitrotranscribed CY1 RNA at the same positions and with similar confidence levels as within plantaCY1 RNA. These data suggest that direct RNA sequencing contains inherent error biases that may be associated with post-translocation RNA folding and low sequence complexity, and therefore extrapolations based on sequencing error require special consideration.

More Like this