skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Protein intrinsically disordered regions have a non-random, modular architecture
Abstract MotivationProtein sequences can be broadly categorized into two classes: those which adopt stable secondary structure and fold into a domain (i.e. globular proteins), and those that do not. The sequences belonging to this latter class are conformationally heterogeneous and are described as being intrinsically disordered. Decades of investigation into the structure and function of globular proteins has resulted in a suite of computational tools that enable their sub-classification by domain type, an approach that has revolutionized how we understand and predict protein functionality. Conversely, it is unknown if sequences of disordered protein regions are subject to broadly generalizable organizational principles that would enable their sub-classification. ResultsHere, we report the development of a statistical approach that quantifies linear variance in amino acid composition across a sequence. With multiple examples, we provide evidence that intrinsically disordered regions are organized into statistically non-random modules of unique compositional bias. Modularity is observed for both low and high-complexity sequences and, in some cases, we find that modules are organized in repetitive patterns. These data demonstrate that disordered sequences are non-randomly organized into modular architectures and motivate future experiments to comprehensively classify module types and to determine the degree to which modules constitute functionally separable units analogous to the domains of globular proteins. Availability and implementationThe source code, documentation, and data to reproduce all figures are freely available at https://github.com/MWPlabUTSW/Chi-Score-Analysis.git. The analysis is also available as a Google Colab Notebook (https://colab.research.google.com/github/MWPlabUTSW/Chi-Score-Analysis/blob/main/ChiScore_Analysis.ipynb).  more » « less
Award ID(s):
2308642
PAR ID:
10498669
Author(s) / Creator(s):
;
Editor(s):
Elofsson, Arne
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
12
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We have developed an algorithm, ParSe, which accurately identifies from the primary sequence those protein regions likely to exhibit physiological phase separation behavior. Originally, ParSe was designed to test the hypothesis that, for flexible proteins, phase separation potential is correlated to hydrodynamic size. While our results were consistent with that idea, we also found that many different descriptors could successfully differentiate between three classes of protein regions: folded, intrinsically disordered, and phase‐separating intrinsically disordered. Consequently, numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. Built from that finding, ParSe 2.0 uses an optimal set of property scales to predict domain‐level organization and compute a sequence‐based prediction of phase separation potential. The algorithm is fast enough to scan the whole of the human proteome in minutes on a single computer and is equally or more accurate than other published predictors in identifying proteins and regions within proteins that drive phase separation. Here, we describe a web application for ParSe 2.0 that may be accessed through a browser by visitinghttps://stevewhitten.github.io/Parse_v2_FASTAto quickly identify phase‐separating proteins within large sequence sets, or by visitinghttps://stevewhitten.github.io/Parse_v2_webto evaluate individual protein sequences. 
    more » « less
  2. Disordered binding regions (DBRs), which are embedded within intrinsically disordered proteins or regions (IDPs or IDRs), enable IDPs or IDRs to mediate multiple protein-protein interactions. DBR-protein complexes were collected from the Protein Data Bank for which two or more DBRs having different amino acid sequences bind to the same (100% sequence identical) globular protein partner, a type of interaction herein called many-to-one binding. Two distinct binding profiles were identified: independent and overlapping. For the overlapping binding profiles, the distinct DBRs interact by means of almost identical binding sites (herein called “similar”), or the binding sites contain both common and divergent interaction residues (herein called “intersecting”). Further analysis of the sequence and structural differences among these three groups indicate how IDP flexibility allows different segments to adjust to similar, intersecting, and independent binding pockets. 
    more » « less
  3. Abstract PUF proteins are characterized by globular RNA-binding domains. They also interact with partner proteins that modulate their RNA-binding activities.Caenorhabditis elegansPUF proteinfem-3binding factor-2 (FBF-2) partners with intrinsically disordered Lateral Signaling Target-1 (LST-1) to regulate target mRNAs in germline stem cells. Here, we report that an intrinsically disordered region (IDR) at the C-terminus of FBF-2 autoinhibits its RNA-binding affinity by increasing the off rate for RNA binding. Moreover, the FBF-2 C-terminal region interacts with its globular RNA-binding domain at the same site where LST-1 binds. This intramolecular interaction restrains an electronegative cluster of amino acid residues near the 5′ end of the bound RNA to inhibit RNA binding. LST-1 binding in place of the FBF-2 C-terminus therefore releases autoinhibition and increases RNA-binding affinity. This regulatory mechanism, driven by IDRs, provides a biochemical and biophysical explanation for the interdependence of FBF-2 and LST-1 in germline stem cell self-renewal. 
    more » « less
  4. Dunbrack, Roland L (Ed.)
    Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets. 
    more » « less
  5. AbstractIntrinsically disordered proteins (IDPs) are a subset of proteins that lack stable secondary structure. Given their polymeric nature, previous mean-field approximations have been used to describe the statistical structure of IDPs. However, the amino-acid sequence heterogeneity and complex intermolecular interaction network have significantly impeded the ability to get proper approximations. One such case is the intrinsically disordered tail domain of neurofilament low (NFLt), which comprises a 50 residue-long uncharged domain followed by a 96 residue-long negatively charged domain. Here, we measure two NFLt variants to identify the impact of the NFLt two main subdomains on its complex interactions and statistical structure. Using synchrotron small-angle x-ray scattering, we find that the uncharged domain of the NFLt induces attractive interactions that cause it to self-assemble into star-like polymer brushes. On the other hand, when the uncharged domain is truncated, the remaining charged N-terminal domains remain isolated in solution with typical polyelectrolyte characteristics. We further discuss how competing long- and short-ranged interactions within the polymer brushes dominate their ensemble structure and, in turn, their implications on previously observed phenomena in NFL native and diseased states. Graphic abstractVisual schematic of the SAXS measurement results of the Neurofilament-low tail domain IDP (NFLt). NFLts assemble into star-like brushes through their hydrophobic N-terminal domains (marked in blue). In increasing salinity, brush height (h) is initially increased following a decrease while gaining additional tails to their assembly. Isolating the charged sub-domain of the NFLt (marked in red) results in isolated polyelectrolytes 
    more » « less