Abstract BackgroundDirect-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. ResultHere, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. ConclusionsSequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available athttps://github.com/dnonatar/Sequoia.
more »
« less
Detecting m6A RNA modification from nanopore sequencing using a semisupervised learning framework
Direct nanopore-based RNA sequencing can be used to detect posttranscriptional base modifications, such as N6-methyladenosine (m6A) methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder–decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation (IP)-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this data set to train an end-to-end neural network basecaller followed by fine-tuning on IP-based experimental data with label smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.
more »
« less
- Award ID(s):
- 2232121
- PAR ID:
- 10572286
- Publisher / Repository:
- Cold Spring Harbor Press
- Date Published:
- Journal Name:
- Genome Research
- Volume:
- 34
- Issue:
- 11
- ISSN:
- 1088-9051
- Page Range / eLocation ID:
- 1987 to 1999
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. We evaluated this idea using real data sets (Escherichia colidata and the human genome NA12878 sequenced by Simpsonet al.) and demonstrated the ability of Transformers to detect methylation on ionic signal data. BackgroundOxford Nanopore long‐read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short‐read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions). MethodIn the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. ResultsCompared to traditional deep‐learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self‐attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation‐specific signals within a specific sequence context. ConclusionWe demonstrated the ability of Transformers to detect methylation on ionic signal data.more » « less
-
null (Ed.)Abstract Chemical modifications of proteins, DNA, and RNA moieties play critical roles in regulating gene expression. Emerging evidence suggests the RNA modifications (epitranscriptomics) have substantive roles in basic biological processes. One of the most common modifications in mRNA and noncoding RNAs is N6-methyladenosine (m6A). In a subset of mRNAs, m6A sites are preferentially enriched near stop codons, in 3′ UTRs, and within exons, suggesting an important role in the regulation of mRNA processing and function including alternative splicing and gene expression. Very little is known about the effect of environmental chemical exposure on m6A modifications. As many of the commonly occurring environmental contaminants alter gene expression profiles and have detrimental effects on physiological processes, it is important to understand the effects of exposure on this important layer of gene regulation. Hence, the objective of this study was to characterize the acute effects of developmental exposure to PCB126, an environmentally relevant dioxin-like PCB, on m6A methylation patterns. We exposed zebrafish embryos to PCB126 for 6 h starting from 72 h post fertilization and profiled m6A RNA using methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq). Our analysis revealed 117 and 217 m6A peaks in the DMSO and PCB126 samples (false discovery rate 5%), respectively. The majority of the peaks were preferentially located around the 3′ UTR and stop codons. Statistical analysis revealed 15 m6A marked transcripts to be differentially methylated by PCB126 exposure. These include transcripts that are known to be activated by AHR agonists (eg, ahrra, tiparp, nfe2l2b) as well as others that are important for normal development (vgf, cebpd, sned1). These results suggest that environmental chemicals such as dioxin-like PCBs could affect developmental gene expression patterns by altering m6A levels. Further studies are necessary to understand the functional consequences of exposure-associated alterations in m6A levels.more » « less
-
Abstract The genomes of organisms from all three domains of life harbor endogenous base modifications in the form of DNA methylation. In bacterial genomes, methylation occurs on adenosine and cytidine residues to include N6-methyladenine (m6A), 5-methylcytosine (m5C), and N4-methylcytosine (m4C). Bacterial DNA methylation has been well characterized in the context of restriction-modification (RM) systems, where methylation regulates DNA incision by the cognate restriction endonuclease. Relative to RM systems less is known about how m6A contributes to the epigenetic regulation of cellular functions in Gram-positive bacteria. Here, we characterize site-specific m6A modifications in the non-palindromic sequence GACGmAG within the genomes of Bacillus subtilis strains. We demonstrate that the yeeA gene is a methyltransferase responsible for the presence of m6A modifications. We show that methylation from YeeA does not function to limit DNA uptake during natural transformation. Instead, we identify a subset of promoters that contain the methylation consensus sequence and show that loss of methylation within promoter regions causes a decrease in reporter expression. Further, we identify a transcriptional repressor that preferentially binds an unmethylated promoter used in the reporter assays. With these results we suggest that m6A modifications in B. subtilis function to promote gene expression.more » « less
-
Abstract Nanopore signal analysis enables detection of nucleotide modifications from native DNA and RNA sequencing, providing both accurate genetic or transcriptomic and epigenetic information without additional library preparation. At present, only a limited set of modifications can be directly basecalled (for example, 5-methylcytosine), while most others require exploratory methods that often begin with alignment of nanopore signal to a nucleotide reference. We present Uncalled4, a toolkit for nanopore signal alignment, analysis and visualization. Uncalled4 features an efficient banded signal alignment algorithm, BAM signal alignment file format, statistics for comparing signal alignment methods and a reproducible de novo training method fork-mer-based pore models, revealing potential errors in Oxford Nanopore Technologies’ state-of-the-art DNA model. We apply Uncalled4 to RNA 6-methyladenine (m6A) detection in seven human cell lines, identifying 26% more modifications than Nanopolish using m6Anet, including in several genes where m6A has known implications in cancer. Uncalled4 is available open source atgithub.com/skovaka/uncalled4.more » « less
An official website of the United States government

