skip to main content


Title: A Physics-Guided Neural Network for Predicting Protein–Ligand Binding Free Energy: From Host–Guest Systems to the PDBbind Database
Calculation of protein–ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein–ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host–guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the “accuracy” of the pure data-driven model. In addition, the “interpretability” and “transferability” of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features.  more » « less
Award ID(s):
2136095
NSF-PAR ID:
10393683
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Biomolecules
Volume:
12
Issue:
7
ISSN:
2218-273X
Page Range / eLocation ID:
919
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Computational simulation of biomolecules can provide important insights into protein design, protein-ligand binding interactions, and ab initio biomolecular folding, among other applications. Accurate treatment of the solvent environment is essential in such applications, but the use of explicit solvents can add considerable cost. Implicit treatment of solvent effects using a dielectric continuum model is an attractive alternative to explicit solvation since it is able to describe solvation effects without the inclusion of solvent degrees of freedom. Previously, we described the development and parameterization of implicit solvent models for small molecules. Here, we extend the parameterization of the generalized Kirkwood (GK) implicit solvent model for use with biomolecules described by the AMOEBA force field via the addition of corrections to the calculation of effective radii that account for interstitial spaces that arise within biomolecules. These include element-specific pairwise descreening scale factors, a short-range neck contribution to describe the solvent-excluded space between pairs of nearby atoms, and finally tanh-based rescaling of the overall descreening integral. We then apply the AMOEBA/GK implicit solvent to a set of ten proteins and achieve an average coordinate root mean square deviation for the experimental structures of 2.0 Å across 500 ns simulations. Overall, the continued development of implicit solvent models will help facilitate the simulation of biomolecules on mechanistically relevant timescales. 
    more » « less
  2. We propose a free energy calculation method for receptor–ligand binding, which have multiple binding poses that avoids exhaustive enumeration of the poses. For systems with multiple binding poses, the standard procedure is to enumerate orientations of the binding poses, restrain the ligand to each orientation, and then, calculate the binding free energies for each binding pose. In this study, we modify a part of the thermodynamic cycle in order to sample a broader conformational space of the ligand in the binding site. This modification leads to more accurate free energy calculation without performing separate free energy simulations for each binding pose. We applied our modification to simple model host–guest systems as a test, which have only two binding poses, by using a single decoupling method (SDM) in implicit solvent. The results showed that the binding free energies obtained from our method without knowing the two binding poses were in good agreement with the benchmark results obtained by explicit enumeration of the binding poses. Our method is applicable to other alchemical binding free energy calculation methods such as the double decoupling method (DDM) in explicit solvent. We performed a calculation for a protein–ligand system with explicit solvent using our modified thermodynamic path. The results of the free energy simulation along our modified path were in good agreement with the results of conventional DDM, which requires a separate binding free energy calculation for each of the binding poses of the example of phenol binding to T4 lysozyme in explicit solvent. © 2019 Wiley Periodicals, Inc.

     
    more » « less
  3. Abstract

    The protein–ligand binding affinity quantifies the binding strength between a protein and its ligand. Computer modeling and simulations can be used to estimate the binding affinity or binding free energy using data- or physics-driven methods or a combination thereof. Here we discuss a purely physics-based sampling approach based on biased molecular dynamics simulations. Our proposed method generalizes and simplifies previously suggested stratification strategies that use umbrella sampling or other enhanced sampling simulations with additional collective-variable-based restraints. The approach presented here uses a flexible scheme that can be easily tailored for any system of interest. We estimate the binding affinity of human fibroblast growth factor 1 to heparin hexasaccharide based on the available crystal structure of the complex as the initial model and four different variations of the proposed method to compare against the experimentally determined binding affinity obtained from isothermal titration calorimetry experiments.

     
    more » « less
  4. CHARMM‐GUI,http://www.charmm-gui.org, is a web‐based graphical user interface that prepares complex biomolecular systems for molecular simulations. CHARMM‐GUI creates input files for a number of programs including CHARMM, NAMD, GROMACS, AMBER, GENESIS, LAMMPS, Desmond, OpenMM, and CHARMM/OpenMM. Since its original development in 2006, CHARMM‐GUI has been widely adopted for various purposes and now contains a number of different modules designed to set up a broad range of simulations: (1)PDB Reader & Manipulator,Glycan Reader, andLigand Reader & Modelerfor reading and modifying molecules; (2)Quick MD Simulator,Membrane Builder,Nanodisc Builder,HMMM Builder,Monolayer Builder,Micelle Builder, andHex Phase Builderfor building all‐atom simulation systems in various environments; (3)PACE CG BuilderandMartini Makerfor building coarse‐grained simulation systems; (4)DEER FacilitatorandMDFF/xMDFF Utilizerfor experimentally guided simulations; (5)Implicit Solvent Modeler,PBEQ‐Solver, andGCMC/BD Ion Simulatorfor implicit solvent related calculations; (6)Ligand Binderfor ligand solvation and binding free energy simulations; and (7)Drude Prepperfor preparation of simulations with the CHARMM Drude polarizable force field. Recently, new modules have been integrated into CHARMM‐GUI, such asGlycolipid Modelerfor generation of various glycolipid structures, andLPS Modelerfor generation of lipopolysaccharide structures from various Gram‐negative bacteria. These new features together with existing modules are expected to facilitate advanced molecular modeling and simulation thereby leading to an improved understanding of the structure and dynamics of complex biomolecular systems. Here, we briefly review these capabilities and discuss potential future directions in the CHARMM‐GUI development project. © 2016 Wiley Periodicals, Inc.

     
    more » « less
  5. Monte Carlo (MC) methods are important computational tools for molecular structure optimizations and predictions. When solvent effects are explicitly considered, MC methods become very expensive due to the large degree of freedom associated with the water molecules and mobile ions. Alternatively implicit-solvent MC can largely reduce the computational cost by applying a mean field approximation to solvent effects and meanwhile maintains the atomic detail of the target molecule. The two most popular implicit-solvent models are the Poisson-Boltzmann (PB) model and the Generalized Born (GB) model in a way such that the GB model is an approximation to the PB model but is much faster in simulation time. In this work, we develop a machine learning-based implicit-solvent Monte Carlo (MLIMC) method by combining the advantages of both implicit solvent models in accuracy and efficiency. Specifically, the MLIMC method uses a fast and accurate PB-based machine learning (PBML) scheme to compute the electrostatic solvation free energy at each step. We validate our MLIMC method by using a benzene-water system and a protein-water system. We show that the proposed MLIMC method has great advantages in speed and accuracy for molecular structure optimization and prediction. 
    more » « less