skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Short tandem repeats bind transcription factors to tune eukaryotic gene expression
Short tandem repeats (STRs) are enriched in eukaryoticcis-regulatory elements and alter gene expression, yet how they regulate transcription remains unknown. We found that STRs modulate transcription factor (TF)–DNA affinities and apparent on-rates by about 70-fold by directly binding TF DNA-binding domains, with energetic impacts exceeding many consensus motif mutations. STRs maximize the number of weakly preferred microstates near target sites, thereby increasing TF density, with impacts well predicted by statistical mechanics. Confirming that STRs also affect TF binding in cells, neural networks trained only on in vivo occupancies predicted effects identical to those observed in vitro. Approximately 90% of TFs preferentially bound STRs that need not resemble known motifs, providing a cis-regulatory mechanism to target TFs to genomic sites.  more » « less
Award ID(s):
2142336
PAR ID:
10478516
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
AAAS
Date Published:
Journal Name:
Science
Volume:
381
Issue:
6664
ISSN:
0036-8075
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many eukaryotic transcription factors (TF) form homodimer or heterodimer complexes to regulate gene expression. Dimerization of BASIC LEUCINE ZIPPER (bZIP) TFs are critical for their functions, but the molecular mechanism underlying the DNA binding and functional specificity of homo-versusheterodimers remains elusive. To address this gap, we present the double DNA Affinity Purification-sequencing (dDAP-seq) technique that maps heterodimer binding sites on endogenous genomic DNA. Using dDAP-seq we profile twenty pairs of C/S1 bZIP heterodimers and S1 homodimers inArabidopsisand show that heterodimerization significantly expands the DNA binding preferences of these TFs. Analysis of dDAP-seq binding sites reveals the function of bZIP9 in abscisic acid response and the role of bZIP53 heterodimer-specific binding in seed maturation. The C/S1 heterodimers show distinct preferences for the ACGT elements recognized by plant bZIPs and motifs resembling the yeast GCN4cis-elements. This study demonstrates the potential of dDAP-seq in deciphering the DNA binding specificities of interacting TFs that are key for combinatorial gene regulation. 
    more » « less
  2. Variations in noncoding regulatory sequences play a central role in evolution. Interpreting such variations, however, remains difficult even in the context of defined attributes such as transcription factor (TF) binding sites. Here, we systematically link variations in cis -regulatory sequences to TF binding by profiling the allele-specific binding of 27 TFs expressed in a yeast hybrid, in which two related genomes are present within the same nucleus. TFs localize preferentially to sites containing their known consensus motifs but occupy only a small fraction of the motif-containing sites available within the genomes. Differential binding of TFs to the orthologous alleles was well explained by variations that alter motif sequence, whereas differences in chromatin accessibility between alleles were of little apparent effect. Motif variations that abolished binding when present in only one allele were still bound when present in both alleles, suggesting evolutionary compensation, with a potential role for sequence conservation at the motif's vicinity. At the level of the full promoter, we identify cases of binding-site turnover, in which binding sites are reciprocally gained and lost, yet most interspecific differences remained uncompensated. Our results show the flexibility of TFs to bind imprecise motifs and the fast evolution of TF binding sites between related species. 
    more » « less
  3. Abstract Transcription factors (TF) are proteins that bind DNA in a sequence-specific manner to regulate gene transcription. Despite their unique intrinsic sequence preferences,in vivogenomic occupancy profiles of TFs differ across cellular contexts. Hence, deciphering the sequence determinants of TF binding, both intrinsic and context-specific, is essential to understand gene regulation and the impact of regulatory, non-coding genetic variation. Biophysical models trained onin vitroTF binding assays can estimate intrinsic affinity landscapes and predict occupancy based on TF concentration and affinity. However, these models cannot adequately explain context-specific,in vivobinding profiles. Conversely, deep learning models, trained onin vivoTF binding assays, effectively predict and explain genomic occupancy profiles as a function of complex regulatory sequence syntax, albeit without a clear biophysical interpretation. To reconcile these complementary models ofin vitroandin vivoTF binding, we developed Affinity Distillation (AD), a method that extracts thermodynamic affinitiesde-novofrom deep learning models of TF chromatin immunoprecipitation (ChIP) experiments by marginalizing away the influence of genomic sequence context. Applied to neural networks modeling diverse classes of yeast and mammalian TFs, AD predicts energetic impacts of sequence variation within and surrounding motifs on TF binding as measured by diversein vitroassays with superior dynamic range and accuracy compared to motif-based methods. Furthermore, AD can accurately discern affinities of TF paralogs. Our results highlight thermodynamic affinity as a key determinant ofin vivobinding, suggest that deep learning models ofin vivobinding implicitly learn high-resolution affinity landscapes, and show that these affinities can be successfully distilled using AD. This new biophysical interpretation of deep learning models enables high-throughputin silicoexperiments to explore the influence of sequence context and variation on both intrinsic affinity andin vivooccupancy. 
    more » « less
  4. Abstract BackgroundIdentifying the DNA-binding specificities of transcription factors (TF) is central to understanding gene networks that regulate growth and development. Such knowledge is lacking in oomycetes, a microbial eukaryotic lineage within the stramenopile group. Oomycetes include many important plant and animal pathogens such as the potato and tomato blight agentPhytophthora infestans, which is a tractable model for studying life-stage differentiation within the group. ResultsMining of the P. infestans genome identified 197 genes encoding proteins belonging to 22 TF families. Their chromosomal distribution was consistent with family expansions through unequal crossing-over, which were likely ancient since each family had similar sizes in most oomycetes. Most TFs exhibited dynamic changes in RNA levels through the P. infestanslife cycle. The DNA-binding preferences of 123 proteins were assayed using protein-binding oligonucleotide microarrays, which succeeded with 73 proteins from 14 families. Binding sites predicted for representatives of the families were validated by electrophoretic mobility shift or chromatin immunoprecipitation assays. Consistent with the substantial evolutionary distance of oomycetes from traditional model organisms, only a subset of the DNA-binding preferences resembled those of human or plant orthologs. Phylogenetic analyses of the TF families withinP. infestansoften discriminated clades with canonical and novel DNA targets. Paralogs with similar binding preferences frequently had distinct patterns of expression suggestive of functional divergence. TFs were predicted to either drive life stage-specific expression or serve as general activators based on the representation of their binding sites within total or developmentally-regulated promoters. This projection was confirmed for one TF using synthetic and mutated promoters fused to reporter genesin vivo. ConclusionsWe established a large dataset of binding specificities forP. infestansTFs, representing the first in the stramenopile group. This resource provides a basis for understanding transcriptional regulation by linking TFs with their targets, which should help delineate the molecular components of processes such as sporulation and host infection. Our work also yielded insight into TF evolution during the eukaryotic radiation, revealing both functional conservation as well as diversification across kingdoms. 
    more » « less
  5. Abstract During gene regulation, DNA accessibility is thought to limit the availability of transcription factor (TF) binding sites, while TFs can increase DNA accessibility to recruit additional factors that upregulate gene expression. Given this interplay, the causative regulatory events in the modulation of gene expression remain unknown for the vast majority of genes. We utilized deeply sequenced ATAC-Seq data and site-specific knock-in reporter genes to investigate the relationship between the binding-site resolution dynamics of DNA accessibility and the expression dynamics of the enhancers of Cebpa during macrophage-neutrophil differentiation. While the enhancers upregulate reporter expression during the earliest stages of differentiation, there is little corresponding increase in their total accessibility. Conversely, total accessibility peaks during the last stages of differentiation without any increase in enhancer activity. The accessibility of positions neighboring C/EBP-family TF binding sites, which indicates TF occupancy, does increase significantly during early differentiation, showing that the early upregulation of enhancer activity is driven by TF binding. These results imply that a generalized increase in DNA accessibility is not sufficient, and binding by enhancer-specific TFs is necessary, for the upregulation of gene expression. Additionally, high-coverage ATAC-Seq combined with time-series expression data can infer the sequence of regulatory events at binding-site resolution. 
    more » « less