skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 11, 2026

Title: Methylation Data Analysis and Interpretation
DNA methylation, a covalent modification, fundamentally shapes mammalian gene regulation and cellular identity. This review examines methylation's biochemical underpinnings, genomic distribution patterns, and analytical approaches. We highlight three distinctive aspects that separate methylation from other epigenetic marks: its remarkable stability as a silencing mechanism, its capacity to maintain distinct states independently of DNA sequence, and its effectiveness as a quantitative trait linking genotype to disease risk. We also explore the phenomenon of methylation clocks and their biological significance. The review addresses technical considerations across major assay types—both array-based technologies and sequencing approaches—with emphasis on data normalization, quality control, cell proportion inference, and the specialized statistical models required for next-generation sequencing analysis.  more » « less
Award ID(s):
2238125
PAR ID:
10627944
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Annual Data Science Review
Date Published:
Journal Name:
Annual Review of Biomedical Data Science
Volume:
8
Issue:
1
ISSN:
2574-3414
Page Range / eLocation ID:
605 to 632
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract DNA methylation plays an important role in many biological processes. The mechanisms underlying the establishment and maintenance of DNA methylation are well understood thanks to decades of research using DNA methylation mutants, primarily in Arabidopsis (Arabidopsis thaliana) accession Col-0. Recent genome-wide association studies (GWASs) using the methylomes of natural accessions have uncovered a complex and distinct genetic basis of variation in DNA methylation at the population level. Sequencing following bisulfite treatment has served as an excellent method for quantifying DNA methylation. Unlike studies focusing on specific accessions with reference genomes, population-scale methylome research often requires an additional round of sequencing beyond obtaining genome assemblies or genetic variations from whole-genome sequencing data, which can be cost prohibitive. Here, we provide an overview of recently developed bisulfite-free methods for quantifying methylation and cost-effective approaches for the simultaneous detection of genetic and epigenetic information. We also discuss the plasticity of DNA methylation in a specific Arabidopsis accession, the contribution of DNA methylation to plant adaptation, and the genetic determinants of variation in DNA methylation in natural populations. The recently developed technology and knowledge will greatly benefit future studies in population epigenomes. 
    more » « less
  2. Abstract Interrogation of chromatin modifications, such as DNA methylation, has the potential to improve forecasting and conservation of marine ecosystems. The standard method for assaying DNA methylation (whole genome bisulphite sequencing), however, is currently too costly to apply at the scales required for ecological research. Here, we evaluate different methods for measuring DNA methylation for ecological epigenetics. We compare whole genome bisulphite sequencing (WGBS) with methylated CpG binding domain sequencing (MBD‐seq), and a modified version of MethylRAD we term methylation‐dependent restriction site‐associated DNA sequencing (mdRAD). We evaluate these three assays in measuring variation in methylation across the genome, between genotypes, and between polyp types in the reef‐building coralAcropora millepora. We find that all three assays measure absolute methylation levels similarly for gene bodies (gbM), as well as exons and 1 Kb windows with a minimum Pearson correlation 0.66. Differential gbM estimates were less correlated, but still concurrent across assays. We conclude that MBD‐seq and mdRAD are reliable and cost‐effective alternatives to WGBS. The considerably lower sequencing effort required for mdRAD to produce comparable methylation estimates makes it particularly useful for ecological epigenetics. 
    more » « less
  3. ABSTRACT Characterizing DNA methylation patterns is important for addressing key questions in evolutionary biology, geroscience, and medical genomics. While costs are decreasing, whole-genome DNA methylation profiling remains prohibitively expensive for most population-scale studies, creating a need for cost-effective, reduced representation approaches (i.e., assays that rely on microarrays, enzyme digests, or sequence capture to target a subset of the genome). Most common whole genome and reduced representation techniques rely on bisulfite conversion, which can damage DNA resulting in DNA loss and sequencing biases. Enzymatic methyl sequencing (EM-seq) was recently proposed to overcome these issues, but thorough benchmarking of EM-seq combined with cost-effective, reduced representation strategies has not yet been performed. To do so, we optimized Targeted Methylation Sequencing protocol (TMS)—which profiles ∼4 million CpG sites—for miniaturization, flexibility, and multispecies use at a cost of ∼$80. First, we tested modifications to increase throughput and reduce cost, including increasing multiplexing, decreasing DNA input, and using enzymatic rather than mechanical fragmentation to prepare DNA. Second, we compared our optimized TMS protocol to commonly used techniques, specifically the Infinium MethylationEPIC BeadChip (n=55 paired samples) and whole genome bisulfite sequencing (n=6 paired samples). In both cases, we found strong agreement between technologies (R² = 0.97 and 0.99, respectively). Third, we tested the optimized TMS protocol in three non-human primate species (rhesus macaques, geladas, and capuchins). We captured a high percentage (mean=77.1%) of targeted CpG sites and produced methylation level estimates that agreed with those generated from reduced representation bisulfite sequencing (R² = 0.98). Finally, we applied our protocol to profile age-associated DNA methylation variation in two subsistence-level populations—the Tsimane of lowland Bolivia and the Orang Asli of Peninsular Malaysia—and found age-methylation patterns that were strikingly similar to those reported in high income cohorts, despite known differences in age-health relationships between lifestyle contexts. Altogether, our optimized TMS protocol will enable cost-effective, population-scale studies of genome-wide DNA methylation levels across human and non-human primate species. 
    more » « less
  4. Sproul, Duncan (Ed.)
    Characterizing DNA methylation patterns is important for addressing key questions in evolutionary biology, development, geroscience, and medical genomics. While costs are decreasing, whole-genome DNA methylation profiling remains prohibitively expensive for most population-scale studies, creating a need for cost-effective, reduced representation approaches (i.e., assays that rely on microarrays, enzyme digests, or sequence capture to target a subset of the genome). Most common whole genome and reduced representation techniques rely on bisulfite conversion, which can damage DNA resulting in DNA loss and sequencing biases. Enzymatic methyl sequencing (EM-seq) was recently proposed to overcome these issues, but thorough benchmarking of EM-seq combined with cost-effective, reduced representation strategies is currently lacking. To address this gap, we optimized the Targeted Methylation Sequencing protocol (TMS)—which profiles ~4 million CpG sites—for miniaturization, flexibility, and multispecies use. First, we tested modifications to increase throughput and reduce cost, including increasing multiplexing, decreasing DNA input, and using enzymatic rather than mechanical fragmentation to prepare DNA. Second, we compared our optimized TMS protocol to commonly used techniques, specifically the Infinium MethylationEPIC BeadChip (n = 55 paired samples) and whole genome bisulfite sequencing (n = 6 paired samples). In both cases, we found strong agreement between technologies (R2 = 0.97 and 0.99, respectively). Third, we tested the optimized TMS protocol in three non-human primate species (rhesus macaques, geladas, and capuchins). We captured a high percentage (mean = 77.1%) of targeted CpG sites and produced methylation level estimates that agreed with those generated from reduced representation bisulfite sequencing (R2 = 0.98). Finally, we confirmed that estimates of 1) epigenetic age and 2) tissue-specific DNA methylation patterns are strongly recapitulated using data generated from TMS versus other technologies. Altogether, our optimized TMS protocol will enable cost-effective, population-scale studies of genome-wide DNA methylation levels across human and non-human primate species. 
    more » « less
  5. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. In the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. We evaluated this idea using real data sets (Escherichia colidata and the human genome NA12878 sequenced by Simpsonet al.) and demonstrated the ability of Transformers to detect methylation on ionic signal data. BackgroundOxford Nanopore long‐read sequencing technology addresses current limitations for DNA methylation detection that are inherent in short‐read bisulfite sequencing or methylation microarrays. A number of analytical tools, such as Nanopolish, Guppy/Tombo and DeepMod, have been developed to detect DNA methylation on Nanopore data. However, additional improvements can be made in computational efficiency, prediction accuracy, and contextual interpretation on complex genomics regions (such as repetitive regions, low GC density regions). MethodIn the current study, we apply Transformer architecture to detect DNA methylation on ionic signals from Oxford Nanopore sequencing data. Transformer is an algorithm that adopts self‐attention architecture in the neural networks and has been widely used in natural language processing. ResultsCompared to traditional deep‐learning method such as convolutional neural network (CNN) and recurrent neural network (RNN), Transformer may have specific advantages in DNA methylation detection, because the self‐attention mechanism can assist the relationship detection between bases that are far from each other and pay more attention to important bases that carry characteristic methylation‐specific signals within a specific sequence context. ConclusionWe demonstrated the ability of Transformers to detect methylation on ionic signal data. 
    more » « less