Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Abstract Analysis of factors that lead to the functionality of transcriptional activation domains remains a crucial and yet challenging task owing to the significant diversity in their sequences and their intrinsically disordered nature. Almost all existing methods that have aimed to predict activation domains have involved traditional machine learning approaches, such as logistic regression, that are unable to capture complex patterns in data or plain convolutional neural networks and have been limited in exploration of structural features. However, there is a tremendous potential in the inspection of the structural properties of activation domains, and an opportunity to investigate complex relationships between features of residues in the sequence. To address these, we have utilized the power of graph neural networks which can represent structural data in the form of nodes and edges, allowing nodes to exchange information among themselves. We have experimented with two kinds of graph formulations, one involving residues as nodes and the other assigning atoms to be the nodes. A logistic regression model was also developed to analyze feature importance. For all the models, several feature combinations were experimented with. The residue-level GNN model with amino acid type, residue position, acidic/basic/aromatic property and secondary structure feature combination gave the best performing model with accuracy, F1 score and AUROC of 97.9%, 71% and 97.1% respectively which outperformed other existing methods in the literature when applied on the dataset we used. Among the other structure-based features that were analyzed, the amphipathic property of helices also proved to be an important feature for classification. Logistic regression results showed that the most dominant feature that makes a sequence functional is the frequency of different types of amino acids in the sequence. Our results consistent have shown that functional sequences have more acidic and aromatic residues whereas basic residues are seen more in non-functional sequences.more » « less
- 
            Activation domains (ADs) of eukaryotic gene activators remain enigmatic for decades as short, extremely variable sequences which often are intrinsically disordered in structure and interact with an uncertain number of targets. The general absence of specificity increasingly complicates the utilization of the widely accepted mechanism of AD function by recruitment of coactivators. The long-standing enigma at the heart of molecular biology demands a fundamental rethinking of established concepts. Here, we review the experimental evidence supporting a novel mechanistic model of gene activation, based on ADs functioning via surfactant-like near-stochastic interactions with gene promoter nucleosomes. This new model is consistent with recent information-rich experimental data obtained using high-throughput synthetic biology and bioinformatics analysis methods, including machine learning. We clarify why the conventional biochemical principle of specificity for sequence, structures, and interactions fails to explain activation domain function. This perspective provides connections to the liquid-liquid phase separation model, signifies near-stochastic interactions as fundamental for the biochemical function, and can be generalized to other cellular functions.more » « lessFree, publicly-accessible full text available November 1, 2025
- 
            Free, publicly-accessible full text available November 1, 2025
- 
            Sara Osman Carolina Perdigoto (Ed.)Gene expression in all eukaryotes depends critically on the function of transcriptional activation domains of gene activator proteins. The conventional model for activation domain (AD) function is the direct physical recruitment of specific coactivators and transcriptional machinery components. However, ADs are short and astronomically variable sequences, with up to 10^24 possible interchangeable sequence variants for a single gene activator; each variant is intrinsically disordered in structure and interacts with its targets with low specificity and affinity. How these peptides recruit their targets is becoming increasingly difficult to explain, exposing a massive knowledge gap in molecular biology. Here, we show that the single required characteristic of ADs—consistent with their extreme variability, intrinsic structural disorder, and near-stochastic interaction mode—is an amphiphilic aromatic–acidic surfactant-like property. We propose that the AD surfactant, by triggering the local gene-promoter chromatin phase transition, catalyzes the formation of “transcription factory” condensates. We demonstrate that the presence of tryptophan and aspartic acid residues in the AD sequence is sufficient for in vivo functionality, even when present only as a single pair of residues within a 20-amino-acid sequence containing nothing more than additional 18 glycine residues. We demonstrate that the amphipathic α-helix structure, suggested previously as beneficial for AD function, is actually detrimental, and breaking this helix by inserting prolines significantly increases activation domain functionality. The proposed surfactant action mechanism based on near-stochastic interactions implied by the minimalistic activation domains changes not only the paradigm for the explanation of gene activation but also the fundamental biochemistry paradigm based on the specificity of sequence-to-structure-to-functional-interaction. The mechanism of activity regulation by near-stochastic allosteric interactions could easily be applied to other biological processes.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available