Abstract Many eukaryotic transcription factors (TF) form homodimer or heterodimer complexes to regulate gene expression. Dimerization of BASIC LEUCINE ZIPPER (bZIP) TFs are critical for their functions, but the molecular mechanism underlying the DNA binding and functional specificity of homo-versusheterodimers remains elusive. To address this gap, we present the double DNA Affinity Purification-sequencing (dDAP-seq) technique that maps heterodimer binding sites on endogenous genomic DNA. Using dDAP-seq we profile twenty pairs of C/S1 bZIP heterodimers and S1 homodimers inArabidopsisand show that heterodimerization significantly expands the DNA binding preferences of these TFs. Analysis of dDAP-seq binding sites reveals the function of bZIP9 in abscisic acid response and the role of bZIP53 heterodimer-specific binding in seed maturation. The C/S1 heterodimers show distinct preferences for the ACGT elements recognized by plant bZIPs and motifs resembling the yeast GCN4cis-elements. This study demonstrates the potential of dDAP-seq in deciphering the DNA binding specificities of interacting TFs that are key for combinatorial gene regulation.
more »
« less
Specificity landscapes unmask submaximal binding site preferences of transcription factors
We have developed Differential Specificity and Energy Landscape (DiSEL) analysis to comprehensively compare DNA–protein interactomes (DPIs) obtained by high-throughput experimental platforms and cutting edge computational methods. While high-affinity DNA binding sites are identified by most methods, DiSEL uncovered nuanced sequence preferences displayed by homologous transcription factors. Pairwise analysis of 726 DPIs uncovered homolog-specific differences at moderate- to low-affinity binding sites (submaximal sites). DiSEL analysis of variants of 41 transcription factors revealed that many disease-causing mutations result in allele-specific changes in binding site preferences. We focused on a set of highly homologous factors that have different biological roles but “read” DNA using identical amino acid side chains. Rather than direct readout, our results indicate that DNA noncontacting side chains allosterically contribute to sculpt distinct sequence preferences among closely related members of transcription factor families.
more »
« less
- Award ID(s):
- 1736026
- PAR ID:
- 10077604
- Publisher / Repository:
- Proceedings of the National Academy of Sciences
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 115
- Issue:
- 45
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- p. E10586-E10595
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences,in vivogenomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained onin vitroTF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific,in vivobinding profiles. Conversely, deep learning models, trained onin vivoTF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models ofin vitroandin vivoTF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinitiesde-novofrom deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diversein vitroassays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant ofin vivobinding, suggest that deep learning models ofin vivobinding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughputin silicoexperiments to explore the influence of sequence context and variation on both intrinsic affinity andin vivooccupancy.more » « less
-
A suboptimal OCT4-SOX2 binding site facilitates the naïve-state specific function of a Klf4 enhancerVall-llosera_Camps, Miquel (Ed.)Enhancers have critical functions in the precise, spatiotemporal control of transcription during development. It is thought that enhancer grammar, or the characteristics and arrangements of transcription factor binding sites, underlie the specific functions of developmental enhancers. In this study, we sought to identify grammatical constraints that direct enhancer activity in the naïve state of pluripotency, focusing on the enhancers for the naïve-state specific gene,Klf4. Using a combination of biochemical tests, reporter assays, and endogenous mutations in mouse embryonic stem cells, we have studied the binding sites for the transcription factors OCT4 and SOX2. We have found that the threeKlf4enhancers contain suboptimal OCT4-SOX2 composite binding sites. Substitution with a high-affinity OCT4-SOX2 binding site inKlf4enhancer E2 rescued enhancer function andKlf4expression upon loss of the ESRRB and STAT3 binding sites. We also observed that the low-affinity of the OCT4-SOX2 binding site is crucial to drive the naïve-state specific activities ofKlf4enhancer E2. Altogether, our work suggests that the affinity of OCT4-SOX2 binding sites could facilitate enhancer functions in specific states of pluripotency.more » « less
-
Transcription factors are proteins that recognize specific DNA sequences and affect local transcriptional processes. They are the primary means by which all organisms control specific gene expression. Understanding which DNA sequences a particular transcription factor recognizes provides important clues into the set of genes that they regulate and, through this, their potential biological functions. Insights may be gained through homology searches and genetic means. However, these approaches can be misleading, especially when comparing distantly related organisms or in cases of complicated transcriptional regulation. In this work, we used a biochemistry-based approach to determine the spectrum of DNA sequences specifically bound by the Thermus thermophilus HB8 TetR-family transcription factor TTHB023. The consensus sequence 5′–(a/c)Y(g/t)A(A/C)YGryCR(g/t)T(c/a)R(g/t)–3′ was found to have a nanomolar binding affinity with TTHB023. Analyzing the T. thermophilus HB8 genome, several TTHB023 consensus binding sites were mapped to the promoters of genes involved in fatty acid biosynthesis. Notably, some of these were not identified previously through genetic approaches, ostensibly given their potential co-regulation by the Thermus thermophilus HB8 TetR-family transcriptional repressor TTHA0167. Our investigation provides additional evidence supporting the usefulness of a biochemistry-based approach for characterizing putative transcription factors, especially in the case of cooperative regulation.more » « less
-
The intrinsic DNA sequence preferences and cell type–specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type–specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species–specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.more » « less
An official website of the United States government
