Abstract Long single-molecular sequencing technologies, such as PacBio circular consensus sequencing (CCS) and nanopore sequencing, are advantageous in detecting DNA 5-methylcytosine in CpGs (5mCpGs), especially in repetitive genomic regions. However, existing methods for detecting 5mCpGs using PacBio CCS are less accurate and robust. Here, we present ccsmeth, a deep-learning method to detect DNA 5mCpGs using CCS reads. We sequence polymerase-chain-reaction treated and M.SssI-methyltransferase treated DNA of one human sample using PacBio CCS for training ccsmeth. Using long (≥10 Kb) CCS reads, ccsmeth achieves 0.90 accuracy and 0.97 Area Under the Curve on 5mCpG detection at single-molecule resolution. At the genome-wide site level, ccsmeth achieves >0.90 correlations with bisulfite sequencing and nanopore sequencing using only 10× reads. Furthermore, we develop a Nextflow pipeline, ccsmethphase, to detect haplotype-aware methylation using CCS reads, and then sequence a Chinese family trio to validate it. ccsmeth and ccsmethphase can be robust and accurate tools for detecting DNA 5-methylcytosines.
more »
« less
This content will become publicly available on March 1, 2026
SMAC: identifying DNA N6-methyladenine (6mA) at the single-molecule level using SMRT CCS data
Abstract DNA modifications, such as N6-methyladenine (6mA), play important roles in various processes in eukaryotes. Single-molecule, real-time (SMRT) sequencing enables the direct detection of DNA modifications without requiring special sample preparation. However, most SMRT-based studies of 6mA rely on ensemble-level consensus by combining multiple reads covering the same genomic position, which misses the single-molecule heterogeneity. While recent methods have aimed at single-molecule level detection of 6mA, limitations in sequencing platforms, resolution, accuracy, and usability restrict their application in comprehensive epigenetic studies. Here, we present SMAC (single-molecule 6mA analysis of CCS reads), a novel framework for accurately detecting 6mA at the single-molecule level using SMRT circular consensus sequencing (CCS) data from the Sequel II system. It is an automated method that streamlines the entire workflow by packaging both existing softwares and built-in scripts, with user-defined parameters to allow easy adaptation for various studies. By utilizing the statistical distribution characteristics of enzyme kinetic indicators on single DNA molecules rather than a fixed cutoff, SMAC significantly improves 6mA detection accuracy at the single-nucleotide and single-molecule levels. It simplifies analysis by providing comprehensive information, including quality control, statistical analysis, and site visualization, directly from raw sequencing data. SMAC is a powerful new tool that enables de novo detection of 6mA and empowers investigation of its functions in modulating physiological processes.
more »
« less
- Award ID(s):
- 2435178
- PAR ID:
- 10653713
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Briefings in Bioinformatics
- Volume:
- 26
- Issue:
- 2
- ISSN:
- 1467-5463
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Although DNAN6-adenine methylation (6mA) is best known in prokaryotes, its presence in eukaryotes has recently generated great interest. Biochemical and genetic evidence supports that AMT1, an MT-A70 family methyltransferase (MTase), is crucial for 6mA deposition in unicellular eukaryotes. Nonetheless, the 6mA transmission mechanism remains to be elucidated. Taking advantage of single-molecule real-time circular consensus sequencing (SMRT CCS), here we provide definitive evidence for semiconservative transmission of 6mA inTetrahymena thermophila. In wild-type (WT) cells, 6mA occurs at the self-complementary ApT dinucleotide, mostly in full methylation (full-6mApT); after DNA replication, hemi-methylation (hemi-6mApT) is transiently present on the parental strand, opposite to the daughter strand readily labeled by 5-bromo-2′-deoxyuridine (BrdU). In ΔAMT1cells, 6mA predominantly occurs as hemi-6mApT. Hemi-to-full conversion in WT cells is fast, robust, and processive, whereas de novo methylation in ΔAMT1cells is slow and sporadic. InTetrahymena, regularly spaced 6mA clusters coincide with the linker DNA of nucleosomes arrayed in the gene body. Importantly, in vitro methylation of human chromatin by the reconstituted AMT1 complex recapitulates preferential targeting of hemi-6mApT sites in linker DNA, supporting AMT1's intrinsic and autonomous role in maintenance methylation. We conclude that 6mA is transmitted by a semiconservative mechanism: full-6mApT is split by DNA replication into hemi-6mApT, which is restored to full-6mApT by AMT1-dependent maintenance methylation. Our study dissects AMT1-dependent maintenance methylation and AMT1-independent de novo methylation, reveals a 6mA transmission pathway with a striking similarity to 5-methylcytosine (5mC) transmission at the CpG dinucleotide, and establishes 6mA as a bona fide eukaryotic epigenetic mark.more » « less
-
Abstract Although long-read single-cell RNA isoform sequencing (scISO-Seq) can reveal alternative RNA splicing in individual cells, it suffers from a low read throughput. Here, we introduce HIT-scISOseq, a method that removes most artifact cDNAs and concatenates multiple cDNAs for PacBio circular consensus sequencing (CCS) to achieve high-throughput and high-accuracy single-cell RNA isoform sequencing. HIT-scISOseq can yield >10 million high-accuracy long-reads in a single PacBio Sequel II SMRT Cell 8M. We also report the development of scISA-Tools that demultiplex HIT-scISOseq concatenated reads into single-cell cDNA reads with >99.99% accuracy and specificity. We apply HIT-scISOseq to characterize the transcriptomes of 3375 corneal limbus cells and reveal cell-type-specific isoform expression in them. HIT-scISOseq is a high-throughput, high-accuracy, technically accessible method and it can accelerate the burgeoning field of long-read single-cell transcriptomics.more » « less
-
Enabled by long-read sequencing technologies, particularly Single Molecule, Real-Time sequencing, N6-methyladenine (6mA) footprinting is a transformative methodology for revealing the heterogenous and dynamic distribution of nucleosomes and other DNA-binding proteins. Here, we present ipdTrimming, a novel 6mA-calling pipeline that outperforms existing tools in both computational efficiency and accuracy. Utilizing this optimized experimental and computational framework, we are able to map nucleosome positioning and transcription factor occupancy in nuclear DNA and establish high-resolution, long-range binding events in mitochondrial DNA. Our study highlights the potential of 6mA footprinting to capture coordinated nucleoprotein binding and to unravel epigenetic heterogeneity.more » « less
-
Abstract Background Biological mutagens (such as transposon) with sequences inserted, play a crucial role to link observed phenotype and genotype in reverse genetic studies. For this reason, accurate and efficient software tools for identifying insertion sites based on the analysis of sequencing reads are desired. Results We developed a bioinformatics tool, a Finder, to identify genome-wide Insertions in Mutagenesis (named as “InMut-Finder”), based on target sequences and flanking sequences from long reads, such as Oxford Nanopore Sequencing. InMut-Finder succeeded in identify > 100 insertion sites in Medicago truncatula and soybean mutants based on sequencing reads of whole-genome DNA or enriched insertion-site DNA fragments. Insertion sites discovered by InMut-Finder were validated by PCR experiments. Conclusion InMut-Finder is a comprehensive and powerful tool for automated insertion detection from Nanopore long reads. The simplicity, efficiency, and flexibility of InMut-Finder make it a valuable tool for functional genomics and forward and reverse genetics. InMut-Finder was implemented with Perl, R, and Shell scripts, which are independent of the OS. The source code and instructions can be accessed at https://github.com/jsg200830/InMut-Finder .more » « less
An official website of the United States government
