Computational prediction of DNA-binding residues (DBRs) and the RNA-binding residues (RBRs) in protein sequences is an active area of research, with about 90 predictors and 20 that were published over the last two years. The new predictors rely on sophisticated deep neural networks and protein language models, produce accurate predictions, and are conveniently available as code and/or web servers. However, we identified shortage of tools that predict these interactions in intrinsically disordered regions and tools capable of predicting residues that interact with specific RNA and DNA types. Moreover, cross-predictions between RBRs and DBRs should be quantified and minimized to ensure that future tools accurately differentiate between these two distinct types of nucleic acids.
more »
« less
This content will become publicly available on October 1, 2026
Comparative assessment of binding residue predictions in intrinsically disordered regions
Abstract Dozens of impactful methods that predict intrinsically disordered regions (IDRs) in protein sequences that interact with proteins and/or nucleic acids were developed. Their training and assessment rely on the IDR‐level binding annotations, while the equivalent structure‐trained methods predict more granular annotations of binding amino acids (AA). We compiled a new benchmark dataset that annotates binding AA in IDRs and applied it to complete a first‐of‐its‐kind assessment of predictions of the disordered binding residues. We evaluated a representative collection of 14 methods, used several hundred low‐similarity test proteins, and focused on the challenging task of differentiating these binding residues from other disordered AA and considering ligand type‐specific predictions (protein–protein vs. protein–nucleic acid interactions). We found that current methods struggle to accurately predict binding IDRs among disordered residues; however, better‐than‐random tools predict disordered binding residues significantly better than binding IDRs. We identified at least one relatively accurate tool for predicting disordered protein‐binding and disordered nucleic acid‐binding AA. Analysis of cross‐predictions between interactions with protein and nucleic acids revealed that most methods are ligand‐type‐agnostic. Only two predictors of the nucleic acid‐binding IDRs and two predictors of the protein‐binding IDRs can be considered as ligand‐type‐specific. We also discussed several potential future directions that would move this field forward by producing more accurate methods that target the prediction of binding residues, reduce cross‐predictions, and cover a broader range of ligand types.
more »
« less
- PAR ID:
- 10646940
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Protein Science
- Volume:
- 34
- Issue:
- 10
- ISSN:
- 0961-8368
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.more » « less
-
Intrinsically disordered regions (IDRs) are important components of protein functionality, with their charge distribution serving as a key factor in determining their roles. Notably, many proteins possess IDRs that are highly negatively charged, characterized by sequences rich in aspartate (D) or glutamate (E) residues. Bioinformatic analyses indicate that negatively charged low-complexity IDRs are significantly more common than their positively charged counterparts rich in arginine (R) or lysine (K). For instance, sequences of 10 or more consecutive negatively charged residues (D or E) are present in 268 human proteins. In contrast, corresponding sequences of 10 or more consecutive positively charged residues (K or R) are present in only 12 human proteins. Interestingly, about 50% of proteins containing D/E tracts function as DNA-binding or RNA-binding proteins. Negatively charged IDRs can electrostatically mimic nucleic acids and dynamically compete with them for the DNA-binding domains (DBDs) or RNA-binding domains (RBDs) that are positively charged. This leads to a phenomenon known as autoinhibition, in which the negatively charged IDRs inhibit binding to nucleic acids by occupying the binding interfaces within the proteins through intramolecular interactions. Rather than merely reducing binding activity, negatively charged IDRs offer significant advantages for the functions of DNA/RNA-binding proteins. The dynamic competition between negatively charged IDRs and nucleic acids can accelerate the target search processes for these proteins. When a protein encounters DNA or RNA, the electrostatic repulsion force between the nucleic acids and the negatively charged IDRs can trigger conformational changes that allow the nucleic acids to access DBDs or RBDs. Additionally, when proteins are trapped at high-affinity non-target sites on DNA or RNA ("decoys"), the electrostatic repulsion from the negatively charged IDRs can rescue the proteins from these traps. Negatively charged IDRs act as gatekeepers, rejecting nonspecific ligands while allowing the target to access the molecular interfaces of DBDs or RBDs, which increases binding specificity. These IDRs can also promote proper protein folding, facilitate chromatin remodeling by displacing other proteins bound to DNA, and influence phase separation, affecting local pH. The functions of negatively charged IDRs can be regulated through protein-protein interactions, post-translational modifications, and proteolytic processing. These characteristics can be harnessed as tools for protein engineering. Some frame-shift mutations that convert negatively charged IDRs into positively charged ones are linked to human diseases. Therefore, it is crucial to understand the physicochemical properties and functional roles of negatively charged IDRs that compete with nucleic acids.more » « less
-
Abstract One of key features of intrinsically disordered regions (IDRs) is facilitation of protein–protein and protein–nucleic acids interactions. These disordered binding regions include molecular recognition features (MoRFs), short linear motifs (SLiMs) and longer binding domains. Vast majority of current predictors of disordered binding regions target MoRFs, with a handful of methods that predict SLiMs and disordered protein-binding domains. A new and broader class of disordered binding regions, linear interacting peptides (LIPs), was introduced recently and applied in the MobiDB resource. LIPs are segments in protein sequences that undergo disorder-to-order transition upon binding to a protein or a nucleic acid, and they cover MoRFs, SLiMs and disordered protein-binding domains. Although current predictors of MoRFs and disordered protein-binding regions could be used to identify some LIPs, there are no dedicated sequence-based predictors of LIPs. To this end, we introduce CLIP, a new predictor of LIPs that utilizes robust logistic regression model to combine three complementary types of inputs: co-evolutionary information derived from multiple sequence alignments, physicochemical profiles and disorder predictions. Ablation analysis suggests that the co-evolutionary information is particularly useful for this prediction and that combining the three inputs provides substantial improvements when compared to using these inputs individually. Comparative empirical assessments using low-similarity test datasets reveal that CLIP secures area under receiver operating characteristic curve (AUC) of 0.8 and substantially improves over the results produced by the closest current tools that predict MoRFs and disordered protein-binding regions. The webserver of CLIP is freely available at http://biomine.cs.vcu.edu/servers/CLIP/ and the standalone code can be downloaded from http://yanglab.qd.sdu.edu.cn/download/CLIP/.more » « less
-
Phase separation processes facilitate the formation of membrane-less organelles and involve interactions within structured domains and intrinsically disordered regions (IDRs) in protein sequences. The literature suggests that the involvement of proteins in phase separation can be predicted from their sequences, leading to the development of over 30 computational predictors. We focused on intrinsic disorder due to its fundamental role in related diseases, and because recent analysis has shown that phase separation can be accurately predicted for structured proteins. We evaluated eight representative amino acid-level predictors of phase separation, capable of identifying phase-separating IDRs, using a well-annotated, low-similarity test dataset under two complementary evaluation scenarios. Several methods generate accurate predictions in the easier scenario that includes both structured and disordered sequences. However, we demonstrate that modern disorder predictors perform equally well in this scenario by effectively differentiating phase-separating IDRs from structured regions. In the second, more challenging scenario—considering only predictions in disordered regions—disorder predictors underperform, and most phase separation predictors produce only modestly accurate results. Moreover, some predictors are broadly biased to classify disordered residues as phase-separating, which results in low predictive performance in this scenario. Finally, we recommend PSPHunter as the most accurate tool for identifying phase-separating IDRs in both scenarios.more » « less
An official website of the United States government
