skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Phospho-Tune: Enhanced Structural Modeling of Phosphorylated Protein Interactions
Abstract In computational biology, accurate prediction of phosphopeptide-protein complex structures is essential for understanding cellular functions and advancing drug discovery and personalized medicine. While AlphaFold has significantly improved protein structure prediction, it faces accuracy challenges in predicting structures of complexes involving phosphopeptides possibly due to structural variations introduced by phosphorylation in the peptide component. Our study addresses this limitation by refining AlphaFold to improve its accuracy in modeling these complex structures. We employed weighted metrics for a comprehensive evaluation across various protein families. The enhanced model notably outperforms the original AlphaFold, showing a substantial increase in the weighted average local distance difference test (lDDT) scores for peptides: from 52.74 to 76.51 in the Top 1 model and from 56.32 to 77.91 in the Top 5 model. These advancements not only deepen our understanding of the role of phosphorylation in cellular signaling but also have extensive implications for biological research and the development of innovative therapies.  more » « less
Award ID(s):
2054251
PAR ID:
10566123
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
bioRxiv
Date Published:
Format(s):
Medium: X
Institution:
bioRxiv
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Accurate prediction of protein–peptide complex structures plays a critical role in structure‐based drug design, including antibody design. Most peptide‐docking benchmark studies were conducted using crystal structures of protein–peptide complexes; as such, the performance of the current peptide docking tools in the practical setting is unknown. Here, the practical setting implies there are no crystal or other experimental structures for the complex, nor for the receptor and peptide. In this work, we have developed a practical docking protocol that incorporated two famous machine learning models, AlphaFold 2 for structural prediction and ANI‐2x for ab initio potential prediction, to achieve a high success rate in modeling protein–peptide complex structures. The docking protocol consists of three major stages. In the first stage, the 3D structure of the receptor is predicted by AlphaFold 2 using the monomer mode, and that of the peptide is predicted by AlphaFold 2 using the multimer mode. We found that it is essential to include the receptor information to generate a high‐quality 3D structure of the peptide. In the second stage, rigid protein–peptide docking is performed using ZDOCK software. In the last stage, the top 10 docking poses are relaxed and refined by ANI‐2x in conjunction with our in‐house geometry optimization algorithm—conjugate gradient with backtracking line search (CG‐BS). CG‐BS was developed by us to more efficiently perform geometry optimization, which takes the potential and force directly from ANI‐2x machine learning models. The docking protocol achieved a very encouraging performance for a set of 62 very challenging protein–peptide systems which had an overall success rate of 34% if only the top 1 docking poses were considered. This success rate increased to 45% if the top 3 docking poses were considered. It is emphasized that this encouraging protein–peptide docking performance was achieved without using any crystal or experimental structures. 
    more » « less
  2. Abstract The recent breakthroughs in structure prediction, where methods such as AlphaFold demonstrated near‐atomic accuracy, herald a paradigm shift in structural biology. The 200 million high‐accuracy models released in the AlphaFold Database are expected to guide protein science in the coming decades. Partitioning these AlphaFold models into domains and assigning them to an evolutionary hierarchy provide an efficient way to gain functional insights into proteins. However, classifying such a large number of predicted structures challenges the infrastructure of current structure classifications, including our Evolutionary Classification of protein Domains (ECOD). Better computational tools are urgently needed to parse and classify domains from AlphaFold models automatically. Here we present a Domain Parser for AlphaFold Models (DPAM) that can automatically recognize globular domains from these models based on inter‐residue distances in 3D structures, predicted aligned errors, and ECOD domains found by sequence (HHsuite) and structural (Dali) similarity searches. Based on a benchmark of 18,759 AlphaFold models, we demonstrate that DPAM can recognize 98.8% of domains and assign correct boundaries for 87.5%, significantly outperforming structure‐based domain parsers and homology‐based domain assignment using ECOD domains found by HHsuite or Dali. Application of DPAM to the massive AlphaFold models will enable efficient classification of domains, providing evolutionary contexts and facilitating functional studies. 
    more » « less
  3. Abstract Protein side-chain packing (PSCP), the problem of predicting side-chain conformations given a fixed backbone structure, has important implications in the modeling of structures and interactions. However, despite the groundbreaking progress in protein structure prediction pioneered by AlphaFold, the existing PSCP methods still rely on experimental inputs, and do not leverage AlphaFold-predicted backbone coordinates to enable PSCP at scale. Here, we perform a large-scale benchmarking of the predictive performance of various PSCP methods on public datasets from multiple rounds of the Critical Assessment of Structure Prediction challenges using a diverse set of evaluation metrics. Empirical results demonstrate that the PSCP methods perform well in packing the side-chains with experimental inputs, but they fail to generalize in repacking AlphaFold-generated structures. We additionally explore the effectiveness of leveraging the self-assessment confidence scores from AlphaFold by implementing a backbone confidence-aware integrative approach. While such a protocol often leads to performance improvement by attaining modest yet statistically significant accuracy gains over the AlphaFold baseline, it does not yield consistent and pronounced improvements. Our study highlights the recent advances and remaining challenges in PSCP in the post-AlphaFold era. 
    more » « less
  4. The inhibition of protein–protein interactions is a growing strategy in drug development. In addition to structured regions, many protein loop regions are involved in protein–protein interactions and thus have been identified as potential drug targets. To effectively target such regions, protein structure is critical. Loop structure prediction is a challenging subgroup in the field of protein structure prediction because of the reduced level of conservation in protein sequences compared to the secondary structure elements. AlphaFold 2 has been suggested to be one of the greatest achievements in the field of protein structure prediction. The AlphaFold 2 predicted protein structures near the X-ray resolution in the Critical Assessment of protein Structure Prediction (CASP 14) competition in 2020. The purpose of this work is to survey the performance of AlphaFold 2 in specifically predicting protein loop regions. We have constructed an independent dataset of 31,650 loop regions from 2613 proteins (deposited after the AlphaFold 2 was trained) with both experimentally determined structures and AlphaFold 2 predicted structures. With extensive evaluation using our dataset, the results indicate that AlphaFold 2 is a good predictor of the structure of loop regions, especially for short loop regions. Loops less than 10 residues in length have an average Root Mean Square Deviation (RMSD) of 0.33 Å and an average the Template Modeling score (TM-score) of 0.82. However, we see that as the number of residues in a given loop increases, the accuracy of AlphaFold 2’s prediction decreases. Loops more than 20 residues in length have an average RMSD of 2.04 Å and an average TM-score of 0.55. Such a correlation between accuracy and length of the loop is directly linked to the increase in flexibility. Moreover, AlphaFold 2 does slightly over-predict α-helices and β-strands in proteins. 
    more » « less
  5. null (Ed.)
    Abstract Protein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii , a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga. 
    more » « less