skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 2:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Title: ParSe 2.0: A web tool to identify drivers of protein phase separation at the proteome level

We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase‐separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain‐level organization and compute a sequence‐based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visiting quickly identify phase‐separating proteins within large sequence sets, or by visiting evaluate individual protein sequences.

more » « less
Award ID(s):
1818090 1943488
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Protein Science
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Intrinsically disordered regions lack stable structure in their native conformation but are nevertheless functional and highly abundant, particularly in Eukaryotes. Disordered moonlighting regions (DMRs) are intrinsically disordered regions that carry out multiple functions. DMRs are different from moonlighting proteins that could be structured and that are annotated at the whole‐protein level. DMRs cannot be identified by current predictors of functions of disorder that focus on specific functions rather than multifunctional regions. We conceptualized, designed and empirically assessed first‐of‐its‐kind sequence‐based predictor of DMRs, DMRpred. This computational tool outputs propensity for being in a DMR for each residue in an input protein sequence. We developed novel amino acid indices that quantify propensities for functions relevant to DMRs and used evolutionary conservation, putative solvent accessibility and intrinsic disorder derived from the input sequence to build a rich profile that is suitable to accurately predict DMRs. We processed this profile to derive innovative features that we input into a Random Forest model to generate the predictions. Empirical assessment shows that DMRpred generates accurate predictions with area under receiver operating characteristic curve = 0.86 and accuracy = 82%. These results are significantly better than the closest alternative approaches that rely on sequence alignment, evolutionary conservation and putative disorder and disorder functions. Analysis of abundance of putative DMRs in the human proteome reveals that as many as 25% of proteins may have long >30 residues) DMRs. A webserver implementation of DMRpred is available at

    more » « less
  2. Abstract

    The Membranome database provides comprehensive structural information on single‐pass (i.e., bitopic) membrane proteins from six evolutionarily distant organisms, including protein–protein interactions, complexes, mutations, experimental structures, and models of transmembrane α‐helical dimers. We present a new version of this database, Membranome 3.0, which was significantly updated by revising the set of 5,758 bitopic proteins and incorporating models generated by AlphaFold 2 in the database. The AlphaFold models were parsed into structural domains located at the different membrane sides, modified to exclude low‐confidence unstructured terminal regions and signal sequences, validated through comparison with available experimental structures, and positioned with respect to membrane boundaries. Membranome 3.0 was re‐developed to facilitate visualization and comparative analysis of multiple 3D structures of proteins that belong to a specified family, complex, biological pathway, or membrane type. New tools for advanced search and analysis of proteins, their interactions, complexes, and mutations were included. The database is freely accessible at

    more » « less
  3. We have investigated the structural evolution in solutions of the intrinsically disordered protein, α-synuclein, as a function of protein concentration and added salt concentration. Accounting for electrostatic and excluded volume interactions based on the protein sequence, our Langevin dynamics simulations reveal that α-synuclein molecules assemble into aggregates and percolated structures with a spontaneous selection of a dominant structure characteristic of microphase separation. This microphase assembly is mainly driven by electrostatic interactions between the residues in N-terminal and C-terminal of the protein molecules, and presence of salt loosens the compactness of the microstructures. We have quantified the features of the spontaneously formed microstructures using interchain radial distribution functions, and experimentally measurable inter-residue contact maps and static structure factors. Our results are in contrast to the commonly hypothesized mechanism of liquid–liquid phase separation (LLPS) for the formation of droplets in solutions of intrinsically disordered proteins, opening a new paradigm to understand the birth and structure of membraneless organelles. In general, construction of phase diagrams of intrinsically disordered proteins and other biomacromolecular systems needs to incorporate features of microphase separation into other mechanisms of macrophase separation and percolation. 
    more » « less
  4. Abstract

    The intense interest in the intrinsically disordered proteins in the life science community, together with the remarkable advancements in predictive technologies, have given rise to the development of a large number of computational predictors of intrinsic disorder from protein sequence. While the growing number of predictors is a positive trend, we have observed a considerable difference in predictive quality among predictors for individual proteins. Furthermore, variable predictor performance is often inconsistent between predictors for different proteins, and the predictor that shows the best predictive performance depends on the unique properties of each protein sequence. We propose a computational approach, DISOselect, to estimate the predictive performance of 12 selected predictors for individual proteins based on their unique sequence‐derived properties. This estimation informs the users about the expected predictive quality for a selected disorder predictor and can be used to recommend methods that are likely to provide the best quality predictions. Our solution does not depend on the results of any disorder predictor; the estimations are made based solely on the protein sequence. Our solution significantly improves predictive performance, as judged with a test set of 1,000 proteins, when compared to other alternatives. We have empirically shown that by using the recommended methods the overall predictive performance for a given set of proteins can be improved by a statistically significant margin. DISOselect is freely available for non‐commercial users through the webserver at

    more » « less
  5. Phase separation of intrinsically disordered proteins (IDPs) commonly underlies the formation of membraneless organelles, which compartmentalize molecules intracellularly in the absence of a lipid membrane. Identifying the protein sequence features responsible for IDP phase separation is critical for understanding physiological roles and pathological consequences of biomolecular condensation, as well as for harnessing phase separation for applications in bioinspired materials design. To expand our knowledge of sequence determinants of IDP phase separation, we characterized variants of the intrinsically disordered RGG domain from LAF-1, a model protein involved in phase separation and a key component of P granules. Based on a predictive coarse-grained IDP model, we identified a region of the RGG domain that has high contact probability and is highly conserved between species; deletion of this region significantly disrupts phase separation in vitro and in vivo. We determined the effects of charge patterning on phase behavior through sequence shuffling. We designed sequences with significantly increased phase separation propensity by shuffling the wild-type sequence, which contains well-mixed charged residues, to increase charge segregation. This result indicates the natural sequence is under negative selection to moderate this mode of interaction. We measured the contributions of tyrosine and arginine residues to phase separation experimentally through mutagenesis studies and computationally through direct interrogation of different modes of interaction using all-atom simulations. Finally, we show that despite these sequence perturbations, the RGG-derived condensates remain liquid-like. Together, these studies advance our fundamental understanding of key biophysical principles and sequence features important to phase separation.

    more » « less