skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Open-ComBind: harnessing unlabeled data for improved binding pose prediction
Abstract

Determination of the bound pose of a ligand is a critical first step in many in silico drug discovery tasks. Molecular docking is the main tool for the prediction of non-covalent binding of a protein and ligand system. Molecular docking pipelines often only utilize the information of one ligand binding to the protein despite the commonly held hypothesis that different ligands share binding interactions when bound to the same receptor. Here we describe Open-ComBind, an easy-to-use, open-source version of the ComBind molecular docking pipeline that leverages information from multiple ligands without known bound structures to enhance pose selection. We first create distributions of feature similarities between ligand pose pairs, comparing near-native poses with all sampled docked poses. These distributions capture the likelihood of observing similar features, such as hydrogen bonds or hydrophobic contacts, in different pose configurations. These similarity distributions are then combined with a per-ligand docking score to enhance overall pose selection by 5% and 4.5% for high-affinity and congeneric series helper ligands, respectively. Open-ComBind reduces the average RMSD of ligands in our benchmark dataset by 9.0%. We provide Open-ComBind as an easy-to-use command line and Python API to increase pose prediction performance atwww.github.com/drewnutt/open_combind.

 
more » « less
PAR ID:
10478411
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Journal of Computer-Aided Molecular Design
Volume:
38
Issue:
1
ISSN:
0920-654X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Significant efforts have been devoted in the last decade to improving molecular docking techniques to predict both accurate binding poses and ranking affinities. Some shortcomings in the field are the limited number of standard methods for measuring docking success and the availability of widely accepted standard data sets for use as benchmarks in comparing different docking algorithms throughout the field. In order to address these issues, we have created a Cross‐Docking Benchmark server. The server is a versatile cross‐docking data set containing 4,399 protein‐ligand complexes across 95 protein targets intended to serve as benchmark set and gold standard for state‐of‐the‐art pose and ranking prediction in easy, medium, hard, or very hard docking targets. The benchmark along with a customizable cross‐docking data set generation tool is available athttp://disco.csb.pitt.edu. We further demonstrate the potential uses of the server in questions outside of basic benchmarking such as the selection of the ideal docking reference structure.

     
    more » « less
  2. Structure-based virtual screening utilizes molecular docking to explore and analyze ligand–macromolecule interactions, crucial for identifying and developing potential drug candidates. Although there is availability of several widely used docking programs, the accurate prediction of binding affinity and binding mode still presents challenges. In this study, we introduced a novel protocol that combines our in-house geometry optimization algorithm, the conjugate gradient with backtracking line search (CG-BS), which is capable of restraining and constraining rotatable torsional angles and other geometric parameters with a highly accurate machine learning potential, ANI-2x, renowned for its precise molecular energy predictions reassembling the wB97X/6-31G(d) model. By integrating this protocol with binding pose prediction using the Glide, we conducted additional structural optimization and potential energy prediction on 11 small molecule–macromolecule and 12 peptide–macromolecule systems. We observed that ANI-2x/CG-BS greatly improved the docking power, not only optimizing binding poses more effectively, particularly when the RMSD of the predicted binding pose by Glide exceeded around 5 Å, but also achieving a 26% higher success rate in identifying those native-like binding poses at the top rank compared to Glide docking. As for the scoring and ranking powers, ANI-2x/CG-BS demonstrated an enhanced performance in predicting and ranking hundreds or thousands of ligands over Glide docking. For example, Pearson’s and Spearman’s correlation coefficients remarkedly increased from 0.24 and 0.14 with Glide docking to 0.85 and 0.69, respectively, with the addition of ANI-2x/CG-BS for optimizing and ranking small molecules binding to the bacterial ribosomal aminoacyl-tRNA receptor. These results suggest that ANI-2x/CG-BS holds considerable potential for being integrated into virtual screening pipelines due to its enhanced docking performance.

     
    more » « less
  3. Abstract

    In the ligand prediction category of CASP15, the challenge was to predict the positions and conformations of small molecules binding to proteins that were provided as amino acid sequences or as models generated by the AlphaFold2 program. For most targets, we used our template‐based ligand docking program ClusPro ligTBM, also implemented as a public server available athttps://ligtbm.cluspro.org/. Since many targets had multiple chains and a number of ligands, several templates, and some manual interventions were required. In a few cases, no templates were found, and we had to use direct docking using the Glide program. Nevertheless, ligTBM was shown to be a very useful tool, and by any ranking criteria, our group was ranked among the top five best‐performing teams. In fact, all the best groups used template‐based docking methods. Thus, it appears that the AlphaFold2‐generated models, despite the high accuracy of the predicted backbone, have local differences from the x‐ray structure that make the use of direct docking methods more challenging. The results of CASP15 confirm that this limitation can be frequently overcome by homology‐based docking.

     
    more » « less
  4. Elucidating protein–ligand interaction is crucial for studying the function of proteins and compounds in an organism and critical for drug discovery and design. The problem of protein–ligand interaction is traditionally tackled by molecular docking and simulation, which is based on physical forces and statistical potentials and cannot effectively leverage cryo-EM data and existing protein structural information in the protein–ligand modeling process. In this work, we developed a deep learning bioinformatics pipeline (DeepProLigand) to predict protein–ligand interactions from cryo-EM density maps of proteins and ligands. DeepProLigand first uses a deep learning method to predict the structure of proteins from cryo-EM maps, which is averaged with a reference (template) structure of the proteins to produce a combined structure to add ligands. The ligands are then identified and added into the structure to generate a protein–ligand complex structure, which is further refined. The method based on the deep learning prediction and template-based modeling was blindly tested in the 2021 EMDataResource Ligand Challenge and was ranked first in fitting ligands to cryo-EM density maps. These results demonstrate that the deep learning bioinformatics approach is a promising direction for modeling protein–ligand interactions on cryo-EM data using prior structural information. 
    more » « less
  5. Protein–peptide interactions play a crucial role in a variety of cellular processes. The protein–peptide complex structure is a key to understand the mechanisms underlying protein–peptide interactions and is critical for peptide therapeutic development. We present a user‐friendly protein–peptide docking server, MDockPeP. Starting from a peptide sequence and a protein receptor structure, the MDockPeP Server globally docks the all‐atom, flexible peptide to the protein receptor. The produced modes are then evaluated with a statistical potential‐based scoring function, ITScorePeP. This method was systematically validated using the peptiDB benchmarking database. At least one near‐native peptide binding mode was ranked among top 10 (or top 500) in 59% (85%) of the bound cases, and in 40.6% (71.9%) of the challenging unbound cases. The server can be used for both protein–peptide complex structure prediction and initial‐stage sampling of the protein–peptide binding modes for other docking or simulation methods. MDockPeP Server is freely available athttp://zougrouptoolkit.missouri.edu/mdockpep. © 2018 Wiley Periodicals, Inc.

     
    more » « less