Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show strong effects of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data.
- PAR ID:
- 10222843
- Date Published:
- Journal Name:
- Science Advances
- Volume:
- 6
- Issue:
- 51
- ISSN:
- 2375-2548
- Page Range / eLocation ID:
- eaba9031
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
DNA base damage arises frequently in living cells and needs to be removed by base excision repair (BER) to prevent mutagenesis and genome instability. Both the formation and repair of base damage occur in chromatin and are conceivably affected by DNA-binding proteins such as transcription factors (TFs). However, to what extent TF binding affects base damage distribution and BER in cells is unclear. Here, we used a genome-wide damage mapping method, N -methylpurine-sequencing (NMP-seq), and characterized alkylation damage distribution and BER at TF binding sites in yeast cells treated with the alkylating agent methyl methanesulfonate (MMS). Our data show that alkylation damage formation was mainly suppressed at the binding sites of yeast TFs ARS binding factor 1 (Abf1) and rDNA enhancer binding protein 1 (Reb1), but individual hotspots with elevated damage levels were also found. Additionally, Abf1 and Reb1 binding strongly inhibits BER in vivo and in vitro, causing slow repair both within the core motif and its adjacent DNA. Repair of ultraviolet (UV) damage by nucleotide excision repair (NER) was also inhibited by TF binding. Interestingly, TF binding inhibits a larger DNA region for NER relative to BER. The observed effects are caused by the TF–DNA interaction, because damage formation and BER can be restored by depletion of Abf1 or Reb1 protein from the nucleus. Thus, our data reveal that TF binding significantly modulates alkylation base damage formation and inhibits repair by the BER pathway. The interplay between base damage formation and BER may play an important role in affecting mutation frequency in gene regulatory regions.more » « less
-
Summary Cell differentiation is driven by changes in the activity of transcription factors (
TF s) and subsequent alterations in transcription. To study this process, differences inTF binding between cell types can be deduced by probing chromatin accessibility. We used cell type‐specific nuclear purification followed by the assay for transposase‐accessible chromatin (ATAC ‐seq) to delineate differences in chromatin accessibility andTF regulatory networks between stem cells of the shoot apical meristem (SAM ) and differentiated leaf mesophyll cells inArabidopsis thaliana . Chromatin accessibility profiles ofSAM stem cells and leaf mesophyll cells were very similar at a qualitative level, yet thousands of regions having quantitatively different chromatin accessibility were also identified. Analysis of the genomic regions preferentially accessible in each cell type identified hundreds of overrepresentedTF ‐binding motifs, highlighting sets ofTF s that are probably important for each cell type. Within these sets, we found evidence for extensive co‐regulation of target genes by multipleTF s that are preferentially expressed in each cell type. Interestingly, theTF s within each of these cell type‐enriched sets also showed evidence of extensively co‐regulating each other. We further found that preferentially accessible chromatin regions in mesophyll cells tended to also be substantially accessible in the stem cells, whereas the converse was not true. This observation suggests that the generally higher accessibility of regulatory elements in stem cells might contribute to their developmental plasticity. This work demonstrates the utility of cell type‐specific chromatin accessibility profiling for the rapid development of testable models of regulatory control differences between cell types. -
Abstract Development of the malaria parasite, Plasmodium falciparum, is regulated by a limited number of sequence-specific transcription factors (TFs). However, the mechanisms by which these TFs recognize genome-wide binding sites is largely unknown. To address TF specificity, we investigated the binding of two TF subsets that either bind CACACA or GTGCAC DNA sequence motifs and further characterized two additional ApiAP2 TFs, PfAP2-G and PfAP2-EXP, which bind unique DNA motifs (GTAC and TGCATGCA). We also interrogated the impact of DNA sequence and chromatin context on P. falciparum TF binding by integrating high-throughput in vitro and in vivo binding assays, DNA shape predictions, epigenetic post-translational modifications, and chromatin accessibility. We found that DNA sequence context minimally impacts binding site selection for paralogous CACACA-binding TFs, while chromatin accessibility, epigenetic patterns, co-factor recruitment, and dimerization correlate with differential binding. In contrast, GTGCAC-binding TFs prefer different DNA sequence context in addition to chromatin dynamics. Finally, we determined that TFs that preferentially bind divergent DNA motifs may bind overlapping genomic regions due to low-affinity binding to other sequence motifs. Our results demonstrate that TF binding site selection relies on a combination of DNA sequence and chromatin features, thereby contributing to the complexity of P. falciparum gene regulatory mechanisms.
-
Abstract During gene regulation, DNA accessibility is thought to limit the availability of transcription factor (TF) binding sites, while TFs can increase DNA accessibility to recruit additional factors that upregulate gene expression. Given this interplay, the causative regulatory events in the modulation of gene expression remain unknown for the vast majority of genes. We utilized deeply sequenced ATAC-Seq data and site-specific knock-in reporter genes to investigate the relationship between the binding-site resolution dynamics of DNA accessibility and the expression dynamics of the enhancers of Cebpa during macrophage-neutrophil differentiation. While the enhancers upregulate reporter expression during the earliest stages of differentiation, there is little corresponding increase in their total accessibility. Conversely, total accessibility peaks during the last stages of differentiation without any increase in enhancer activity. The accessibility of positions neighboring C/EBP-family TF binding sites, which indicates TF occupancy, does increase significantly during early differentiation, showing that the early upregulation of enhancer activity is driven by TF binding. These results imply that a generalized increase in DNA accessibility is not sufficient, and binding by enhancer-specific TFs is necessary, for the upregulation of gene expression. Additionally, high-coverage ATAC-Seq combined with time-series expression data can infer the sequence of regulatory events at binding-site resolution.