skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Clusters of acidic and hydrophobic residues can predict acidic transcriptional activation domains from protein sequence
Abstract Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence, models for predicting activation domains from protein sequence have lagged, particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors. Here, we show how the accuracy of this human predictor arises from the clustering of aromatic, leucine, and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of convolutional neural network (CNN) models trained in yeast, the intersection is more accurate than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors.  more » « less
Award ID(s):
2112057
PAR ID:
10510795
Author(s) / Creator(s):
;
Editor(s):
Kaplan, C
Publisher / Repository:
GENETICS
Date Published:
Journal Name:
GENETICS
Volume:
225
Issue:
2
ISSN:
1943-2631
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Eukaryotic transcription factors activate gene expression with their DNA-binding domains and activation domains. DNA- binding domains bind the genome by recognizing structurally related DNA sequences; they are structured, conserved, and predictable from protein sequences. Activation domains recruit chromatin modifiers, coactivator complexes, or basal tran- scriptional machinery via structurally diverse protein-protein interactions. Activation domains and DNA-binding domains have been called independent, modular units, but there are many departures from modularity, including interactions be- tween these regions and overlap in function. Compared to DNA-binding domains, activation domains are poorly under- stood because they are poorly conserved, intrinsically disor- dered, and difficult to predict from protein sequences. This review, organized around commonly asked questions, de- scribes recent progress that the field has made in under- standing the sequence features that control activation domains and predicting them from sequence. 
    more » « less
  2. Abstract Sequence-specific activation by transcription factors is essential for gene regulation1,2. Key to this are activation domains, which often fall within disordered regions of transcription factors3,4and recruit co-activators to initiate transcription5. These interactions are difficult to characterize via most experimental techniques because they are typically weak and transient6,7. Consequently, we know very little about whether these interactions are promiscuous or specific, the mechanisms of binding, and how these interactions tune the strength of gene activation. To address these questions, we developed a microfluidic platform for expression and purification of hundreds of activation domains in parallel followed by direct measurement of co-activator binding affinities (STAMMPPING, for Simultaneous Trapping of Affinity Measurements via a Microfluidic Protein-Protein INteraction Generator). By applying STAMMPPING to quantify direct interactions between eight co-activators and 204 human activation domains (>1,500Kds), we provide the first quantitative map of these interactions and reveal 334 novel binding pairs. We find that the metazoan-specific co-activator P300 directly binds >100 activation domains, potentially explaining its widespread recruitment across the genome to influence transcriptional activation. Despite sharing similar molecular properties (e.g.enrichment of negative and hydrophobic residues), activation domains utilize distinct biophysical properties to recruit certain co-activator domains. Co-activator domain affinity and occupancy are well-predicted by analytical models that account for multivalency, andin vitroaffinities quantitatively predict activation in cells with an ultrasensitive response. Not only do our results demonstrate the ability to measure affinities between even weak protein-protein interactions in high throughput, but they also provide a necessary resource of over 1,500 activation domain/co-activator affinities which lays the foundation for understanding the molecular basis of transcriptional activation. 
    more » « less
  3. Abstract Transcription factors carry long intrinsically disordered regions often containing multiple activation domains. Despite numerous recent high‐throughput identifications and characterizations of activation domains, the interplay between sequence motifs, activation domains, and regulator binding in intrinsically disordered transcription factor regions remains unresolved. Here, we map sequence motifs and activation domains in anArabidopsis thalianaNAC transcription factor clade, revealing that although sequence motifs and activation domains often coincide, no systematic overlap exists. Biophysical analyses using NMR spectroscopy show that the long intrinsically disordered region of senescence‐associated transcription factor ANAC046 is devoid of residual structure. We identify two activation domain/sequence motif regions, one at each end that both bind a panel of six positive and negative regulator domains from biologically relevant regulators promiscuously. Binding affinities measured using isothermal titration calorimetry reveal a hierarchy for regulator binding of the two ANAC046 activation domain/sequence motif regions defining these as regulatory hotspots. Despite extensive dynamic intramolecular contacts along the disordered chain revealed using paramagnetic relaxation enhancement experiments and simulations, the regions remain uncoupled in binding. Together, the results imply rheostatic regulation by ANAC046 through concentration‐dependent regulator competition, a mechanism likely mirrored in other transcription factors with distantly located activation domains. 
    more » « less
  4. Abstract Transcription factors regulate gene expression by binding to regulatory DNA and recruiting regulatory protein complexes. The DNA-binding and protein-binding functions of transcription factors are traditionally described as independent functions performed by modular protein domains. Here, I argue that genome binding can be a 2-part process with both DNA-binding and protein-binding steps, enabling transcription factors to perform a 2-step search of the nucleus to find their appropriate binding sites in a eukaryotic genome. I support this hypothesis with new and old results in the literature, discuss how this hypothesis parsimoniously resolves outstanding problems, and present testable predictions. 
    more » « less
  5. null (Ed.)
    Cells adapt and respond to changes by regulating the activity of their genes. To turn genes on or off, they use a family of proteins called transcription factors. Transcription factors influence specific but overlapping groups of genes, so that each gene is controlled by several transcription factors that act together like a dimmer switch to regulate gene activity. The presence of transcription factors attracts proteins such as the Mediator complex, which activates genes by gathering the protein machines that read the genes. The more transcription factors are found near a specific gene, the more strongly they attract Mediator and the more active the gene is. A specific region on the transcription factor called the activation domain is necessary for this process. The biochemical sequences of these domains vary greatly between species, yet activation domains from, for example, yeast and human proteins are often interchangeable. To understand why this is the case, Sanborn et al. analyzed the genome of baker’s yeast and identified 150 activation domains, each very different in sequence. Three-quarters of them bound to a subunit of the Mediator complex called Med15. Sanborn et al. then developed a machine learning algorithm to predict activation domains in both yeast and humans. This algorithm also showed that negatively charged and greasy regions on the activation domains were essential to be activated by the Mediator complex. Further analyses revealed that activation domains used different poses to bind multiple sites on Med15, a behavior known as ‘fuzzy’ binding. This creates a high overall affinity even though the binding strength at each individual site is low, enabling the protein complexes to remain dynamic. These weak interactions together permit fine control over the activity of several genes, allowing cells to respond quickly and precisely to many changes. The computer algorithm used here provides a new way to identify activation domains across species and could improve our understanding of how living things grow, adapt and evolve. It could also give new insights into mechanisms of disease, particularly cancer, where transcription factors are often faulty. 
    more » « less