skip to main content


Title: Many-to-one binding by intrinsically disordered protein regions
Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called “similar”), or the binding sites contain both common and divergent interaction residues (herein called “intersecting”). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets.  more » « less
Award ID(s):
1661391
NSF-PAR ID:
10172565
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Pacific symposium on biocomputing
Volume:
25
ISSN:
2335-6928
Page Range / eLocation ID:
159-170
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Protein intrinsically disordered regions (IDRs) play an important role in many biological processes. Two key properties of IDRs are (i) the occurrence is proteome-wide and (ii) the ratio of disordered residues is about 6%, which makes it challenging to accurately predict IDRs. Most IDR prediction methods use sequence profile to improve accuracy, which prevents its application to proteome-wide prediction since it is time-consuming to generate sequence profiles. On the other hand, the methods without using sequence profile fare much worse than using sequence profile.

    Method

    This article formulates IDR prediction as a sequence labeling problem and employs a new machine learning method called Deep Convolutional Neural Fields (DeepCNF) to solve it. DeepCNF is an integration of deep convolutional neural networks (DCNN) and conditional random fields (CRF); it can model not only complex sequence–structure relationship in a hierarchical manner, but also correlation among adjacent residues. To deal with highly imbalanced order/disorder ratio, instead of training DeepCNF by widely used maximum-likelihood, we develop a novel approach to train it by maximizing area under the ROC curve (AUC), which is an unbiased measure for class-imbalanced data.

    Results

    Our experimental results show that our IDR prediction method AUCpreD outperforms existing popular disorder predictors. More importantly, AUCpreD works very well even without sequence profile, comparing favorably to or even outperforming many methods using sequence profile. Therefore, our method works for proteome-wide disorder prediction while yielding similar or better accuracy than the others.

    Availability and Implementation

    http://raptorx2.uchicago.edu/StructurePropertyPred/predict/

    Contact

    wangsheng@uchicago.edu, jinboxu@gmail.com

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Intrinsically disordered regions (IDRs) carry out many cellular functions and vary in length and placement in protein sequences. This diversity leads to variations in the underlying compositional biases, which were demonstrated for the short vs. long IDRs. We analyze compositional biases across four classes of disorder: fully disordered proteins; short IDRs; long IDRs; and binding IDRs. We identify three distinct biases: for the fully disordered proteins, the short IDRs and the long and binding IDRs combined. We also investigate compositional bias for putative disorder produced by leading disorder predictors and find that it is similar to the bias of the native disorder. Interestingly, the accuracy of disorder predictions across different methods is correlated with the correctness of the compositional bias of their predictions highlighting the importance of the compositional bias. The predictive quality is relatively low for the disorder classes with compositional bias that is the most different from the “generic” disorder bias, while being much higher for the classes with the most similar bias. We discover that different predictors perform best across different classes of disorder. This suggests that no single predictor is universally best and motivates the development of new architectures that combine models that target specific disorder classes. 
    more » « less
  3. Abstract

    One of key features of intrinsically disordered regions (IDRs) is facilitation of protein–protein and protein–nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.

     
    more » « less
  4. Intracellular compartmentalization plays a pivotal role in cellular function, with membrane-bound organelles and membrane-less biomolecular 'condensates' playing key roles. These condensates, formed through liquid-liquid phase separation (LLPS), enable selective compartmentalization without the barrier of a lipid bilayer, thereby facilitating rapid formation/dissolution in response to stimuli. Intrinsically disordered proteins (IDPs) and/or proteins with intrinsically disordered regions (IDRs), which are often rich in charged and polar amino acid sequences, scaffold many condensates, often in conjunction with RNA. Comprehending the impact of IDP/IDR sequences on phase separation poses a challenge due to the extensive chemical diversity resulting from the myriad amino acids and post-translational modifications. To tackle this hurdle, one approach has been to investigate LLPS in simplified polypeptide systems, which offer a narrower scope within the chemical space for exploration. This strategy is supported by studies that have demonstrated how IDP function can largely be understood based on general chemical features, such as clusters or patterns of charged amino acids, rather than residue-level effects, and the ways in which these kinds of motifs give rise to an ensemble of conformations. Our lab has utilized complex coacervates assembled from oppositely-charged polypeptides as a simplified material analogue to the complexity of liquid-liquid phase separated biological condensates. Complex coacervation is an associative LLPS that occurs due to the electrostatic complexation of oppositely-charged macro-ions. This process is believed to be driven by the entropic gains resulting from the release of bound counterions and the reorganization of water upon complex formation. Apart from their direct applicability to IDPs, polypeptides also serve as excellent model polymers for investigating molecular interactions due to the wide range of available side-chain functionalities and the capacity to finely regulate their sequence, thus enabling precise control over interactions with guest molecules. Here, we discuss fundamental studies examining how charge patterning, hydrophobicity, chirality, and architecture affect the phase separation of polypeptide-based complex coacervates. These efforts have leveraged a combination of experimental and computational approaches that provide insight into the molecular level interactions. We also examine how these parameters affect the ability of complex coacervates to incorporate globular proteins and viruses. These efforts couple directly with our fundamental studies into coacervate formation, as such ‘guest’ molecules should not be considered as experiencing simple encapsulation and are instead active participants in the electrostatic assembly of coacervate materials. Interestingly, we observed trends in the incorporation of proteins and viruses into coacervates formed using different chain length polypeptides that are not well explained by simple electrostatic arguments and may be the result of more complex interactions between globular and polymeric species. Additionally, we describe experimental evidence supporting the potential for complex coacervates to improve the thermal stability of embedded biomolecules such as viral vaccines. Ultimately, peptide-based coacervates have the potential to help unravel the physics behind biological condensates while paving the way for innovative methods in compartmentalization, purification, and biomolecule stabilization. These advancements could have implications spanning from medicine to biocatalysis. 
    more » « less
  5. Abstract

    In eukaryotes, many DNA/RNA-binding proteins possess intrinsically disordered regions (IDRs) with large negative charge, some of which involve a consecutive sequence of aspartate (D) or glutamate (E) residues. We refer to them as D/E repeats. The functional role of D/E repeats is not well understood, though some of them are known to cause autoinhibition through intramolecular electrostatic interaction with functional domains. In this work, we investigated the impacts of D/E repeats on the target DNA search kinetics for the high-mobility group box 1 (HMGB1) protein and the artificial protein constructs of the Antp homeodomain fused with D/E repeats of varied lengths. Our experimental data showed that D/E repeats of particular lengths can accelerate the target association in the overwhelming presence of non-functional high-affinity ligands (‘decoys’). Our coarse-grained molecular dynamics (CGMD) simulations showed that the autoinhibited proteins can bind to DNA and transition into the uninhibited complex with DNA through an electrostatically driven induced-fit process. In conjunction with the CGMD simulations, our kinetic model can explain how D/E repeats can accelerate the target association process in the presence of decoys. This study illuminates an unprecedented role of the negatively charged IDRs in the target search process.

     
    more » « less