skip to main content


Title: Assessment of software methods for estimating protein-protein relative binding affinities
A growing number of computational tools have been developed to accurately and rapidly predict the impact of amino acid mutations on protein-protein relative binding affinities. Such tools have many applications, for example, designing new drugs and studying evolutionary mechanisms. In the search for accuracy, many of these methods employ expensive yet rigorous molecular dynamics simulations. By contrast, non-rigorous methods use less exhaustive statistical mechanics, allowing for more efficient calculations. However, it is unclear if such methods retain enough accuracy to replace rigorous methods in binding affinity calculations. This trade-off between accuracy and computational expense makes it difficult to determine the best method for a particular system or study. Here, eight non-rigorous computational methods were assessed using eight antibody-antigen and eight non-antibody-antigen complexes for their ability to accurately predict relative binding affinities (ΔΔG) for 654 single mutations. In addition to assessing accuracy, we analyzed the CPU cost and performance for each method using a variety of physico-chemical structural features. This allowed us to posit scenarios in which each method may be best utilized. Most methods performed worse when applied to antibody-antigen complexes compared to non-antibody-antigen complexes. Rosetta-based JayZ and EasyE methods classified mutations as destabilizing (ΔΔG < -0.5 kcal/mol) with high (83–98%) accuracy and a relatively low computational cost for non-antibody-antigen complexes. Some of the most accurate results for antibody-antigen systems came from combining molecular dynamics with FoldX with a correlation coefficient (r) of 0.46, but this was also the most computationally expensive method. Overall, our results suggest these methods can be used to quickly and accurately predict stabilizing versus destabilizing mutations but are less accurate at predicting actual binding affinities. This study highlights the need for continued development of reliable, accessible, and reproducible methods for predicting binding affinities in antibody-antigen proteins and provides a recipe for using current methods.  more » « less
Award ID(s):
1736253
NSF-PAR ID:
10293061
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
PloS one
ISSN:
1932-6203
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Protein–protein binding is fundamental to most biological processes. It is important to be able to use computation to accurately estimate the change in protein–protein binding free energy due to mutations in order to answer biological questions that would be experimentally challenging, laborious, or time-consuming. Although nonrigorous free-energy methods are faster, rigorous alchemical molecular dynamics-based methods are considerably more accurate and are becoming more feasible with the advancement of computer hardware and molecular simulation software. Even with sufficient computational resources, there are still major challenges to using alchemical free-energy methods for protein–protein complexes, such as generating hybrid structures and topologies, maintaining a neutral net charge of the system when there is a charge-changing mutation, and setting up the simulation. In the current study, we have used the pmx package to generate hybrid structures and topologies, and a double-system/single-box approach to maintain the net charge of the system. To test the approach, we predicted relative binding affinities for two protein–protein complexes using a nonequilibrium alchemical method based on the Crooks fluctuation theorem and compared the results with experimental values. The method correctly identified stabilizing from destabilizing mutations for a small protein–protein complex, and a larger, more challenging antibody complex. Strong correlations were obtained between predicted and experimental relative binding affinities for both protein–protein systems. 
    more » « less
  2. Fariselli, Piero (Ed.)
    Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used S sym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between S sym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms. 
    more » « less
  3. Abstract

    Computationally modeling how mutations affect protein–protein binding not only helps uncover the biophysics of protein interfaces, but also enables the redesign and optimization of protein interactions. Traditional high‐throughput methods for estimating binding free energy changes are currently limited to mutations directly at the interface due to difficulties in accurately modeling how long‐distance mutations propagate their effects through the protein structure. However, the modeling and design of such mutations is of substantial interest as it allows for greater control and flexibility in protein design applications. We have developed a method that combines high‐throughput Rosetta‐based side‐chain optimization with conformational sampling using classical molecular dynamics simulations, finding significant improvements in our ability to accurately predict long‐distance mutational perturbations to protein binding. Our approach uses an analytical framework grounded in alchemical free energy calculations while enabling exploration of a vastly larger sequence space. When comparing to experimental data, we find that our method can predict internal long‐distance mutational perturbations with a level of accuracy similar to that of traditional methods in predicting the effects of mutations at the protein–protein interface. This work represents a new and generalizable approach to optimize protein free energy landscapes for desired biological functions.

     
    more » « less
  4. Methods which accurately predict protein – ligand binding strengths are critical for drug discovery. In the last two decades, advances in chemical modelling have enabled steadily accelerating progress in the discovery and optimization of structure-based drug design. Most computational methods currently used in this context are based on molecular mechanics force fields that often have deficiencies in describing the quantum mechanical (QM) aspects of molecular binding. In this study, we show the competitiveness of our QM-based Molecules-in-Molecules (MIM) fragmentation method for characterizing binding energy trends for seven different datasets of protein – ligand complexes. By using molecular fragmentation, the MIM method allows for accelerated QM calculations. We demonstrate that for classes of structurally similar ligands bound to a common receptor, MIM provides excellent correlation to experiment, surpassing the more popular Molecular Mechanics Poisson-Boltzmann Surface Area (MM/PBSA) and Molecular Mechanics Generalized Born Surface Area (MM/GBSA) methods. The MIM method offers a relatively simple, well-defined protocol by which binding trends can be ascertained at the QM level and is suggested as a promising option for lead optimization in structure-based drug design. 
    more » « less
  5. Abstract Motivation

    Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently.

    Results

    We developed TransFun—a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating that the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy.

    Availability and implementation

    The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun.

     
    more » « less