skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: SSIPe: accurately estimating protein–protein binding affinity change upon mutations using evolutionary profiles in combination with an optimized physical energy function
Abstract Motivation Most proteins perform their biological functions through interactions with other proteins in cells. Amino acid mutations, especially those occurring at protein interfaces, can change the stability of protein–protein interactions (PPIs) and impact their functions, which may cause various human diseases. Quantitative estimation of the binding affinity changes (ΔΔGbind) caused by mutations can provide critical information for protein function annotation and genetic disease diagnoses. Results We present SSIPe, which combines protein interface profiles, collected from structural and sequence homology searches, with a physics-based energy function for accurate ΔΔGbind estimation. To offset the statistical limits of the PPI structure and sequence databases, amino acid-specific pseudocounts were introduced to enhance the profile accuracy. SSIPe was evaluated on large-scale experimental data containing 2204 mutations from 177 proteins, where training and test datasets were stringently separated with the sequence identity between proteins from the two datasets below 30%. The Pearson correlation coefficient between estimated and experimental ΔΔGbind was 0.61 with a root-mean-square-error of 1.93 kcal/mol, which was significantly better than the other methods. Detailed data analyses revealed that the major advantage of SSIPe over other traditional approaches lies in the novel combination of the physical energy function with the new knowledge-based interface profile. SSIPe also considerably outperformed a former profile-based method (BindProfX) due to the newly introduced sequence profiles and optimized pseudocount technique that allows for consideration of amino acid-specific prior mutation probabilities. Availability and implementation Web-server/standalone program, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/SSIPe and https://github.com/tommyhuangthu/SSIPe. Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
1901191
PAR ID:
10167308
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Bioinformatics
Volume:
36
Issue:
8
ISSN:
1367-4803
Page Range / eLocation ID:
2429 to 2437
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/. 
    more » « less
  2. Proteins play a central role in biology from immune recognition to brain activity. While major advances in machine learning have improved our ability to predict protein structure from sequence, determining protein function from its sequence or structure remains a major challenge. Here, we introduce holographic convolutional neural network (H-CNN) for proteins, which is a physically motivated machine learning approach to model amino acid preferences in protein structures. H-CNN reflects physical interactions in a protein structure and recapitulates the functional information stored in evolutionary data. H-CNN accurately predicts the impact of mutations on protein stability and binding of protein complexes. Our interpretable computational model for protein structure–function maps could guide design of novel proteins with desired function. 
    more » « less
  3. The ability to accurately predict protein–protein interactions is critically important for understanding major cellular processes. However, current experimental and computational approaches for identifying them are technically very challenging and still have limited success. We propose a new computational method for predicting protein–protein interactions using only primary sequence information. It utilizes the concept of physicochemical similarity to determine which interactions will most likely occur. In our approach, the physicochemical features of proteins are extracted using bioinformatics tools for different organisms. Then they are utilized in a machine-learning method to identify successful protein–protein interactions via correlation analysis. It was found that the most important property that correlates most with the protein–protein interactions for all studied organisms is dipeptide amino acid composition (the frequency of specific amino acid pairs in a protein sequence). While current approaches often overlook the specificity of protein–protein interactions with different organisms, our method yields context-specific features that determine protein–protein interactions. The analysis is specifically applied to the bacterial two-component system that includes histidine kinase and transcriptional response regulators, as well as to the barnase–barstar complex, demonstrating the method’s versatility across different biological systems. Our approach can be applied to predict protein–protein interactions in any biological system, providing an important tool for investigating complex biological processes’ mechanisms. 
    more » « less
  4. Gromiha, Michael (Ed.)
    Abstract Motivation Accurate modeling of protein–protein interaction interface is essential for high-quality protein complex structure prediction. Existing approaches for estimating the quality of a predicted protein complex structural model utilize only the physicochemical properties or energetic contributions of the interacting atoms, ignoring evolutionarily information or inter-atomic multimeric geometries, including interaction distance and orientations. Results Here, we present PIQLE, a deep graph learning method for protein–protein interface quality estimation. PIQLE leverages multimeric interaction geometries and evolutionarily information along with sequence- and structure-derived features to estimate the quality of individual interactions between the interfacial residues using a multi-head graph attention network and then probabilistically combines the estimated quality for scoring the overall interface. Experimental results show that PIQLE consistently outperforms existing state-of-the-art methods including DProQA, TRScore, GNN-DOVE and DOVE on multiple independent test datasets across a wide range of evaluation metrics. Our ablation study and comparison with the self-assessment module of AlphaFold-Multimer repurposed for protein complex scoring reveal that the performance gains are connected to the effectiveness of the multi-head graph attention network in leveraging multimeric interaction geometries and evolutionary information along with other sequence- and structure-derived features adopted in PIQLE. Availability and implementation An open-source software implementation of PIQLE is freely available at https://github.com/Bhattacharya-Lab/PIQLE. Supplementary information Supplementary data are available at Bioinformatics Advances online. 
    more » « less
  5. null (Ed.)
    The relation between amino acid (AA) sequence and biologically active conformation controls the process of polypeptide chains folding into three-dimensional (3d) protein structures. The recent achievements in the resolution achieved in cryo-electron microscopy coupled with improvements in computational methodologies have accelerated the analysis of structures and properties of proteins. However, the detailed interaction between AAs has not been fully elucidated. Herein, we present a de novo method to evaluate inter-amino acid interactions based on the concept of accurately evaluating the amino acid bond pairs (AABP). The results obtained enabled the identification of complex 3d long-range interconnected AA interacting network in proteins. The method is applied to the receptor binding domain (RBD) of the SARS-CoV-2 spike protein. We show that although nearest-neighbor AAs in the primary sequence have large AABP, other nonlocal AAs make substantial contribution to AABP with significant participation of both covalent and hydrogen bonding. Detailed analysis of AABP in RBD reveals the pivotal role they play in sequence conservation with profound implications on residue mutations and for therapeutic drug design. This approach could be easily applied to many other proteins of biomedical interest in life sciences. 
    more » « less