Protein‐ligand binding is a fundamental biological process that is paramount to many other biological processes, such as signal transduction, metabolic pathways, enzyme construction, cell secretion, and gene expression. Accurate prediction of protein‐ligand binding affinities is vital to rational drug design and the understanding of protein‐ligand binding and binding induced function. Existing binding affinity prediction methods are inundated with geometric detail and involve excessively high dimensions, which undermines their predictive power for massive binding data. Topology provides the ultimate level of abstraction and thus incurs too much reduction in geometric information. Persistent homology embeds geometric information into topological invariants and bridges the gap between complex geometry and abstract topology. However, it oversimplifies biological information. This work introduces element specific persistent homology (ESPH) or multicomponent persistent homology to retain crucial biological information during topological simplification. The combination of ESPH and machine learning gives rise to a powerful paradigm for macromolecular analysis. Tests on 2 large data sets indicate that the proposed topology‐based machine‐learning paradigm outperforms other existing methods in protein‐ligand binding affinity predictions. ESPH reveals protein‐ligand binding mechanism that can not be attained from other conventional techniques. The present approach reveals that protein‐ligand hydrophobic interactions are extended to 40Å away from the binding site, which has a significant ramification to drug and protein design.
The protein–ligand binding affinity quantifies the binding strength between a protein and its ligand. Computer modeling and simulations can be used to estimate the binding affinity or binding free energy using data- or physics-driven methods or a combination thereof. Here we discuss a purely physics-based sampling approach based on biased molecular dynamics simulations. Our proposed method generalizes and simplifies previously suggested stratification strategies that use umbrella sampling or other enhanced sampling simulations with additional collective-variable-based restraints. The approach presented here uses a flexible scheme that can be easily tailored for any system of interest. We estimate the binding affinity of human fibroblast growth factor 1 to heparin hexasaccharide based on the available crystal structure of the complex as the initial model and four different variations of the proposed method to compare against the experimentally determined binding affinity obtained from isothermal titration calorimetry experiments.
more » « less- PAR ID:
- 10388448
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Computational Science
- Volume:
- 3
- Issue:
- 1
- ISSN:
- 2662-8457
- Page Range / eLocation ID:
- p. 59-70
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Calculation of protein–ligand binding affinity is a cornerstone of drug discovery. Classic implicit solvent models, which have been widely used to accomplish this task, lack accuracy compared to experimental references. Emerging data-driven models, on the other hand, are often accurate yet not fully interpretable and also likely to be overfitted. In this research, we explore the application of Theory-Guided Data Science in studying protein–ligand binding. A hybrid model is introduced by integrating Graph Convolutional Network (data-driven model) with the GBNSR6 implicit solvent (physics-based model). The proposed physics-data model is tested on a dataset of 368 complexes from the PDBbind refined set and 72 host–guest systems. Results demonstrate that the proposed Physics-Guided Neural Network can successfully improve the “accuracy” of the pure data-driven model. In addition, the “interpretability” and “transferability” of our model have boosted compared to the purely data-driven model. Further analyses include evaluating model robustness and understanding relationships between the physical features.more » « less
-
Abstract New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein–ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein–ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein–ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein–ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein–ligand interactions.more » « less
-
Identifying novel drug-target interactions is a critical and rate-limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, here we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Here we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training to improve binding predictions for novel proteins and ligands. We validate AI-Bind predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. AI-Bind is a high-throughput approach to identify drug-target combinations with the potential of becoming a powerful tool in drug discovery.more » « less
-
Protein-peptide interactions play essential roles in many cellular processes and their structural characterization is the major focus of current experimental and theoretical research. Two decades ago, it was proposed to employ the steered molecular dynamics (SMD) to assess the strength of protein-peptide interactions. The idea behind using SMD simulations is that the mechanical stability can be used as a promising and an efficient alternative to computationally highly demanding estimation of binding affinity. However, mechanical stability defined as a peak in force-extension profile depends on the choice of the pulling direction. Here we propose an uncommon choice of the pulling direction along resultant dipole moment (RDM) vector, which has not been explored in SMD simulations so far. Using explicit solvent all-atom MD simulations, we apply SMD technique to probe mechanical resistance of ligand-receptor system pulled along two different vectors. A novel pulling direction—when ligand unbinds along the RDM vector—results in stronger forces compared to commonly used ligand unbinding along center of masses vector. Our observation that RDM is one of the factors influencing the mechanical stability of protein-peptide complex can be used to improve the ranking of binding affinities by using mechanical stability as an effective scoring function.more » « less