skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on June 1, 2025

Title: Geometry Optimization Algorithms in Conjunction with the Machine Learning Potential ANI-2x Facilitate the Structure-Based Virtual Screening and Binding Mode Prediction

Structure-based virtual screening utilizes molecular docking to explore and analyze ligand–macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule–macromolecule and 12 peptide–macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson’s and Spearman’s correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance.

 
more » « less
Award ID(s):
1955260
PAR ID:
10548526
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
MDPI, Basel, Switzerland
Date Published:
Journal Name:
Biomolecules
Volume:
14
Issue:
6
ISSN:
2218-273X
Page Range / eLocation ID:
648
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Significant efforts have been devoted in the last decade to improving molecular docking techniques to predict both accurate binding poses and ranking affinities. Some shortcomings in the field are the limited number of standard methods for measuring docking success and the availability of widely accepted standard data sets for use as benchmarks in comparing different docking algorithms throughout the field. In order to address these issues, we have created a Cross‐Docking Benchmark server. The server is a versatile cross‐docking data set containing 4,399 protein‐ligand complexes across 95 protein targets intended to serve as benchmark set and gold standard for state‐of‐the‐art pose and ranking prediction in easy, medium, hard, or very hard docking targets. The benchmark along with a customizable cross‐docking data set generation tool is available athttp://disco.csb.pitt.edu. We further demonstrate the potential uses of the server in questions outside of basic benchmarking such as the selection of the ideal docking reference structure.

     
    more » « less
  2. Abstract

    Structure-based virtual screening is a key tool in early drug discovery, with growing interest in the screening of multi-billion chemical compound libraries. However, the success of virtual screening crucially depends on the accuracy of the binding pose and binding affinity predicted by computational docking. Here we develop a highly accurate structure-based virtual screen method, RosettaVS, for predicting docking poses and binding affinities. Our approach outperforms other state-of-the-art methods on a wide range of benchmarks, partially due to our ability to model receptor flexibility. We incorporate this into a new open-source artificial intelligence accelerated virtual screening platform for drug discovery. Using this platform, we screen multi-billion compound libraries against two unrelated targets, a ubiquitin ligase target KLHDC2 and the human voltage-gated sodium channel NaV1.7. For both targets, we discover hit compounds, including seven hits (14% hit rate) to KLHDC2 and four hits (44% hit rate) to NaV1.7, all with single digit micromolar binding affinities. Screening in both cases is completed in less than seven days. Finally, a high resolution X-ray crystallographic structure validates the predicted docking pose for the KLHDC2 ligand complex, demonstrating the effectiveness of our method in lead discovery.

     
    more » « less
  3. Abstract

    Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor. Here we describe Open-ComBind, an easy-to-use, open-source version of the ComBind molecular docking pipeline that leverages information from multiple ligands without known bound structures to enhance pose selection. We first create distributions of feature similarities between ligand pose pairs, comparing near-native poses with all sampled docked poses. These distributions capture the likelihood of observing similar features, such as hydrogen bonds or hydrophobic contacts, in different pose configurations. These similarity distributions are then combined with a per-ligand docking score to enhance overall pose selection by 5% and 4.5% for high-affinity and congeneric series helper ligands, respectively. Open-ComBind reduces the average RMSD of ligands in our benchmark dataset by 9.0%. We provide Open-ComBind as an easy-to-use command line and Python API to increase pose prediction performance atwww.github.com/drewnutt/open_combind.

     
    more » « less
  4. Abstract

    Antibodies are key proteins produced by the immune system to target pathogen proteins termed antigens via specific binding to surface regions called epitopes. Given an antigen and the sequence of an antibody the knowledge of the epitope is critical for the discovery and development of antibody based therapeutics. In this work, we present a computational protocol that uses template‐based modeling and docking to predict epitope residues. This protocol is implemented in three major steps. First, a template‐based modeling approach is used to build the antibody structures. We tested several options, including generation of models using AlphaFold2. Second, each antibody model is docked to the antigen using the fast Fourier transform (FFT) based docking program PIPER. Attention is given to optimally selecting the docking energy parameters depending on the input data. In particular, the van der Waals energy terms are reduced for modeled antibodies relative to x‐ray structures. Finally, ranking of antigen surface residues is produced. The ranking relies on the docking results, that is, how often the residue appears in the docking poses' interface, and also on the energy favorability of the docking pose in question. The method, called PIPER‐Map, has been tested on a widely used antibody–antigen docking benchmark. The results show that PIPER‐Map improves upon the existing epitope prediction methods. An interesting observation is that epitope prediction accuracy starting from antibody sequence alone does not significantly differ from that of starting from unbound (i.e., separately crystallized) antibody structure.

     
    more » « less
  5. Metabotropic glutamate receptors (mGluRs) play an important role in regulating glutamate signal pathways, which are involved in neuropathy and periphery homeostasis. mGluR4, which belongs to Group III mGluRs, is most widely distributed in the periphery among all the mGluRs. It has been proved that the regulation of this receptor is involved in diabetes, colorectal carcinoma and many other diseases. However, the application of structure-based drug design to identify small molecules to regulate the mGluR4 receptor is limited due to the absence of a resolved mGluR4 protein structure. In this work, we first built a homology model of mGluR4 based on a crystal structure of mGluR8, and then conducted hierarchical virtual screening (HVS) to identify possible active ligands for mGluR4. The HVS protocol consists of three hierarchical filters including Glide docking, molecular dynamic (MD) simulation and binding free energy calculation. We successfully prioritized active ligands of mGluR4 from a set of screening compounds using HVS. The predicted active ligands based on binding affinities can almost cover all the experiment-determined active ligands, with only one ligand missed. The correlation between the measured and predicted binding affinities is significantly improved for the MM-PB/GBSA-WSAS methods compared to the Glide docking method. More importantly, we have identified hotspots for ligand binding, and we found that SER157 and GLY158 tend to contribute to the selectivity of mGluR4 ligands, while ALA154 and ALA155 could account for the ligand selectivity to mGluR8. We also recognized other 5 key residues that are critical for ligand potency. The difference of the binding profiles between mGluR4 and mGluR8 can guide us to develop more potent and selective modulators. Moreover, we evaluated the performance of IPSF, a novel type of scoring function trained by a machine learning algorithm on residue–ligand interaction profiles, in guiding drug lead optimization. The cross-validation root-mean-square errors (RMSEs) are much smaller than those by the endpoint methods, and the correlation coefficients are comparable to the best endpoint methods for both mGluRs. Thus, machine learning-based IPSF can be applied to guide lead optimization, albeit the total number of actives/inactives are not big, a typical scenario in drug discovery projects. 
    more » « less