Abstract Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
more »
« less
SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data
Abstract DNA modifications, such as N6-methyladenine (6mA), play important roles in various processes in eukaryotes. Single-molecule, real-time (SMRT) sequencing enables the direct detection of DNA modifications without requiring special sample preparation. However, most SMRT-based studies of 6mA rely on ensemble-level consensus by combining multiple reads covering the same genomic position, which misses the single-molecule heterogeneity. While recent methods have aimed at single-molecule level detection of 6mA, limitations in sequencing platforms, resolution, accuracy, and usability restrict their application in comprehensive epigenetic studies. Here, we present SMAC (single-molecule 6mA analysis of CCS reads), a novel framework for accurately detecting 6mA at the single-molecule level using SMRT circular consensus sequencing (CCS) data from the Sequel II system. It is an automated method that streamlines the entire workflow by packaging both existing softwares and built-in scripts, with user-defined parameters to allow easy adaptation for various studies. By utilizing the statistical distribution characteristics of enzyme kinetic indicators on single DNA molecules rather than a fixed cutoff, SMAC significantly improves 6mA detection accuracy at the single-nucleotide and single-molecule levels. It simplifies analysis by providing comprehensive information, including quality control, statistical analysis, and site visualization, directly from raw sequencing data. SMAC is a powerful new tool that enables de novo detection of 6mA and empowers investigation of its functions in modulating physiological processes.
more »
« less
- Award ID(s):
- 2435178
- PAR ID:
- 10653713
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Briefings in Bioinformatics
- Volume:
- 26
- Issue:
- 2
- ISSN:
- 1467-5463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Although DNAN6-adenine methylation (6mA) is best known in prokaryotes, its presence in eukaryotes has recently generated great interest. Biochemical and genetic evidence supports that AMT1, an MT-A70 family methyltransferase (MTase), is crucial for 6mA deposition in unicellular eukaryotes. Nonetheless, the 6mA transmission mechanism remains to be elucidated. Taking advantage of single-molecule real-time circular consensus sequencing (SMRT CCS), here we provide definitive evidence for semiconservative transmission of 6mA inTetrahymena thermophila. In wild-type (WT) cells, 6mA occurs at the self-complementary ApT dinucleotide, mostly in full methylation (full-6mApT); after DNA replication, hemi-methylation (hemi-6mApT) is transiently present on the parental strand, opposite to the daughter strand readily labeled by 5-bromo-2′-deoxyuridine (BrdU). In ΔAMT1cells, 6mA predominantly occurs as hemi-6mApT. Hemi-to-full conversion in WT cells is fast, robust, and processive, whereas de novo methylation in ΔAMT1cells is slow and sporadic. InTetrahymena, regularly spaced 6mA clusters coincide with the linker DNA of nucleosomes arrayed in the gene body. Importantly, in vitro methylation of human chromatin by the reconstituted AMT1 complex recapitulates preferential targeting of hemi-6mApT sites in linker DNA, supporting AMT1's intrinsic and autonomous role in maintenance methylation. We conclude that 6mA is transmitted by a semiconservative mechanism: full-6mApT is split by DNA replication into hemi-6mApT, which is restored to full-6mApT by AMT1-dependent maintenance methylation. Our study dissects AMT1-dependent maintenance methylation and AMT1-independent de novo methylation, reveals a 6mA transmission pathway with a striking similarity to 5-methylcytosine (5mC) transmission at the CpG dinucleotide, and establishes 6mA as a bona fide eukaryotic epigenetic mark.more » « less
-
Abstract Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.more » « less
-
Enabled by long-read sequencing technologies, particularly Single Molecule, Real-Time sequencing, N6-methyladenine (6mA) footprinting is a transformative methodology for revealing the heterogenous and dynamic distribution of nucleosomes and other DNA-binding proteins. Here, we present ipdTrimming, a novel 6mA-calling pipeline that outperforms existing tools in both computational efficiency and accuracy. Utilizing this optimized experimental and computational framework, we are able to map nucleosome positioning and transcription factor occupancy in nuclear DNA and establish high-resolution, long-range binding events in mitochondrial DNA. Our study highlights the potential of 6mA footprinting to capture coordinated nucleoprotein binding and to unravel epigenetic heterogeneity.more » « less
-
Abstract Motivation The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. Results In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6 mA and 5mC methylation states comparing to previous hidden Markov model (HMM) based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6 mA using only 2× coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20× coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. Availability and implementation DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. Supplementary information Supplementary data are available at bioinformatics online.more » « less
An official website of the United States government

