skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Surfactant mechanism of gene activation by transcriptional activation domains of sequence-specific factors
Gene expression in all eukaryotes depends critically on the function of transcriptional activation domains of gene activator proteins. The conventional model for activation domain (AD) function is the direct physical recruitment of specific coactivators and transcriptional machinery components. However, ADs are short and astronomically variable sequences, with up to 10^24 possible interchangeable sequence variants for a single gene activator; each variant is intrinsically disordered in structure and interacts with its targets with low specificity and affinity. How these peptides recruit their targets is becoming increasingly difficult to explain, exposing a massive knowledge gap in molecular biology. Here, we show that the single required characteristic of ADs—consistent with their extreme variability, intrinsic structural disorder, and near-stochastic interaction mode—is an amphiphilic aromatic–acidic surfactant-like property. We propose that the AD surfactant, by triggering the local gene-promoter chromatin phase transition, catalyzes the formation of “transcription factory” condensates. We demonstrate that the presence of tryptophan and aspartic acid residues in the AD sequence is sufficient for in vivo functionality, even when present only as a single pair of residues within a 20-amino-acid sequence containing nothing more than additional 18 glycine residues. We demonstrate that the amphipathic α-helix structure, suggested previously as beneficial for AD function, is actually detrimental, and breaking this helix by inserting prolines significantly increases activation domain functionality. The proposed surfactant action mechanism based on near-stochastic interactions implied by the minimalistic activation domains changes not only the paradigm for the explanation of gene activation but also the fundamental biochemistry paradigm based on the specificity of sequence-to-structure-to-functional-interaction. The mechanism of activity regulation by near-stochastic allosteric interactions could easily be applied to other biological processes.  more » « less
Award ID(s):
1925646
PAR ID:
10340347
Author(s) / Creator(s):
Editor(s):
Sara Osman Carolina Perdigoto
Date Published:
Journal Name:
Nature structural molecular biology
ISSN:
1545-9985
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Activation domains (ADs) of eukaryotic gene activators remain enigmatic for decades as short, extremely variable sequences which often are intrinsically disordered in structure and interact with an uncertain number of targets. The general absence of specificity increasingly complicates the utilization of the widely accepted mechanism of AD function by recruitment of coactivators. The long-standing enigma at the heart of molecular biology demands a fundamental rethinking of established concepts. Here, we review the experimental evidence supporting a novel mechanistic model of gene activation, based on ADs functioning via surfactant-like near-stochastic interactions with gene promoter nucleosomes. This new model is consistent with recent information-rich experimental data obtained using high-throughput synthetic biology and bioinformatics analysis methods, including machine learning. We clarify why the conventional biochemical principle of specificity for sequence, structures, and interactions fails to explain activation domain function. This perspective provides connections to the liquid-liquid phase separation model, signifies near-stochastic interactions as fundamental for the biochemical function, and can be generalized to other cellular functions. 
    more » « less
  2. Gene expression in Arabidopsis is regulated by more than 1,900 transcription factors (TFs), which have been identified genome-wide by the presence of well-conserved DNA-binding domains. Activator TFs contain activation domains (ADs) that recruit coactivator complexes; however, for nearly all Arabidopsis TFs, we lack knowledge about the presence, location and transcriptional strength of their ADs1. To address this gap, here we use a yeast library approach to experimentally identify Arabidopsis ADs on a proteome-wide scale, and find that more than half of the Arabidopsis TFs contain an AD. We annotate 1,553 ADs, the vast majority of which are, to our knowledge, previously unknown. Using the dataset generated, we develop a neural network to accurately predict ADs and to identify sequence features that are necessary to recruit coactivator complexes. We uncover six distinct combinations of sequence features that result in activation activity, providing a framework to interrogate the subfunctionalization of ADs. Furthermore, we identify ADs in the ancient AUXIN RESPONSE FACTOR family of TFs, revealing that AD positioning is conserved in distinct clades. Our findings provide a deep resource for understanding transcriptional activation, a framework for examining function in intrinsically disordered regions and a predictive model of ADs. 
    more » « less
  3. Abstract Sequence-specific activation by transcription factors is essential for gene regulation1,2. Key to this are activation domains, which often fall within disordered regions of transcription factors3,4and recruit co-activators to initiate transcription5. These interactions are difficult to characterize via most experimental techniques because they are typically weak and transient6,7. Consequently, we know very little about whether these interactions are promiscuous or specific, the mechanisms of binding, and how these interactions tune the strength of gene activation. To address these questions, we developed a microfluidic platform for expression and purification of hundreds of activation domains in parallel followed by direct measurement of co-activator binding affinities (STAMMPPING, for Simultaneous Trapping of Affinity Measurements via a Microfluidic Protein-Protein INteraction Generator). By applying STAMMPPING to quantify direct interactions between eight co-activators and 204 human activation domains (>1,500Kds), we provide the first quantitative map of these interactions and reveal 334 novel binding pairs. We find that the metazoan-specific co-activator P300 directly binds >100 activation domains, potentially explaining its widespread recruitment across the genome to influence transcriptional activation. Despite sharing similar molecular properties (e.g.enrichment of negative and hydrophobic residues), activation domains utilize distinct biophysical properties to recruit certain co-activator domains. Co-activator domain affinity and occupancy are well-predicted by analytical models that account for multivalency, andin vitroaffinities quantitatively predict activation in cells with an ultrasensitive response. Not only do our results demonstrate the ability to measure affinities between even weak protein-protein interactions in high throughput, but they also provide a necessary resource of over 1,500 activation domain/co-activator affinities which lays the foundation for understanding the molecular basis of transcriptional activation. 
    more » « less
  4. Abstract Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence. To address these, we have utilized the power of graph neural networks which can represent structural data in the form of nodes and edges, allowing nodes to exchange information among themselves. We have experimented with two kinds of graph formulations, one involving residues as nodes and the other assigning atoms to be the nodes. A logistic regression model was also developed to analyze feature importance. For all the models, several feature combinations were experimented with. The residue-level GNN model with amino acid type, residue position, acidic/basic/aromatic property and secondary structure feature combination gave the best performing model with accuracy, F1 score and AUROC of 97.9%, 71% and 97.1% respectively which outperformed other existing methods in the literature when applied on the dataset we used. Among the other structure-based features that were analyzed, the amphipathic property of helices also proved to be an important feature for classification. Logistic regression results showed that the most dominant feature that makes a sequence functional is the frequency of different types of amino acids in the sequence. Our results consistent have shown that functional sequences have more acidic and aromatic residues whereas basic residues are seen more in non-functional sequences. 
    more » « less
  5. Kaplan, C (Ed.)
    Abstract Transcription factors activate gene expression in development, homeostasis, and stress with DNA binding domains and activation domains. Although there exist excellent computational models for predicting DNA binding domains from protein sequence, models for predicting activation domains from protein sequence have lagged, particularly in metazoans. We recently developed a simple and accurate predictor of acidic activation domains on human transcription factors. Here, we show how the accuracy of this human predictor arises from the clustering of aromatic, leucine, and acidic residues, which together are necessary for acidic activation domain function. When we combine our predictor with the predictions of convolutional neural network (CNN) models trained in yeast, the intersection is more accurate than individual models, emphasizing that each approach carries orthogonal information. We synthesize these findings into a new set of activation domain predictions on human transcription factors. 
    more » « less