- Award ID(s):
- 1661391
- PAR ID:
- 10172565
- Date Published:
- Journal Name:
- Pacific symposium on biocomputing
- Volume:
- 25
- ISSN:
- 2335-6928
- Page Range / eLocation ID:
- 159-170
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Nucleoli are multicomponent condensates defined by coexisting sub-phases. We identified distinct intrinsically disordered regions (IDRs), including acidic (D/E) tracts and K-blocks interspersed by E-rich regions, as defining features of nucleolar proteins. We show that the localization preferences of nucleolar proteins are determined by their IDRs and the types of RNA or DNA binding domains they encompass. In vitro reconstitutions and studies in cells showed how condensation, which combines binding and complex coacervation of nucleolar components, contributes to nucleolar organization. D/E tracts of nucleolar proteins contribute to lowering the pH of co-condensates formed with nucleolar RNAs in vitro. In cells, this sets up a pH gradient between nucleoli and the nucleoplasm. By contrast, juxta-nucleolar bodies, which have different macromolecular compositions, featuring protein IDRs with very different charge profiles, have pH values that are equivalent to or higher than the nucleoplasm. Our findings show that distinct compositional specificities generate distinct physicochemical properties for condensates.more » « less
-
Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes.more » « less
-
Abstract One of key features of intrinsically disordered regions (IDRs) is facilitation of protein–protein and protein–nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.
-
Abstract Motivation Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile.
Method This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence–structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data.
Results Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others.
Availability and Implementation http://raptorx2.uchicago.edu/StructurePropertyPred/predict/
Contact wangsheng@uchicago.edu, jinboxu@gmail.com
Supplementary information Supplementary data are available at Bioinformatics online.
-
Intrinsically disordered proteins (IDPs) engage in various fundamental biological activities, and their behavior is of particular importance for a better understanding of the verbose but well-organized signal transduction in cells. IDPs exhibit uniquely paradoxical features with low affinity but simultaneously high specificity in recognizing their binding targets. The transcription factor p53 plays a crucial role in cancer suppression, carrying out some of its biological functions using its disordered regions, such as N-terminal transactivation domain 2 (TAD2). Exploration of the binding and unbinding processes between proteins is challenging, and the inherently disordered properties of these regions further complicate the issue. Computer simulations are a powerful tool to complement the experiments to fill gaps to explore the binding/unbinding processes between proteins. Here, we investigated the binding mechanism between p300 Taz2 and p53 TAD2 through extensive molecular dynamics (MD) simulations using the physics- based UNited RESidue (UNRES) force field with additional Go̅-like potentials. Distance restraints extracted from the NMR- resolved structures were imposed on intermolecular residue pairs to accelerate binding simulations, in which Taz2 was immobilized in a native-like conformation and disordered TAD2 was fully free. Starting from six structures with TAD2 placed at different positions around Taz2, we observed a metastable intermediate state in which the middle helical segment of TAD2 is anchored in the binding pocket, highlighting the significance of the TAD2 helix in directing protein recognition. Physics-based binding simulations show that successful binding is achieved after a series of stages, including (1) protein collisions to initiate the formation of encounter complexes, (2) partial attachment of TAD2, and finally (3) full attachment of TAD2 to the correct binding pocket of Taz2. Furthermore, machine-learning-based PathDetect-SOM was used to identify two binding pathways, the encounter complexes, and the intermediate states.more » « less