skip to main content

This content will become publicly available on December 1, 2022

Title: Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities
Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity more » to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost. « less
Authors:
; ; ; ; ; ;
Award ID(s):
1955260
Publication Date:
NSF-PAR ID:
10226297
Journal Name:
Journal of Cheminformatics
Volume:
13
Issue:
1
ISSN:
1758-2946
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methodsmore »before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.« less
  2. While the COVID-19 pandemic continues to worsen, effective medicines that target the life cycle of SARS-CoV-2 are still under development. As more highly infective and dangerous variants of the coronavirus emerge, the protective power of vaccines will decrease or vanish. Thus, the development of drugs, which are free of drug resistance is direly needed. The aim of this study is to identify allosteric binding modulators from a large compound library to inhibit the binding between the Spike protein of the SARS-CoV-2 virus and human angiotensin-converting enzyme 2 (hACE2). The binding of the Spike protein to hACE2 is the first stepmore »of the infection of host cells by the coronavirus. We first built a compound library containing 77 448 antiviral compounds. Molecular docking was then conducted to preliminarily screen compounds which can potently bind to the Spike protein at two allosteric binding sites. Next, molecular dynamics simulations were performed to accurately calculate the binding affinity between the spike protein and an identified compound from docking screening and to investigate whether the compound can interfere with the binding between the Spike protein and hACE2. We successfully identified two possible drug binding sites on the Spike protein and discovered a series of antiviral compounds which can weaken the interaction between the Spike protein and hACE2 receptor through conformational changes of the key Spike residues at the Spike–hACE2 binding interface induced by the binding of the ligand at the allosteric binding site. We also applied our screening protocol to another compound library which consists of 3407 compounds for which the inhibitory activities of Spike/hACE2 binding were measured. Encouragingly, in vitro data supports that the identified compounds can inhibit the Spike–ACE2 binding. Thus, we developed a promising computational protocol to discover allosteric inhibitors of the binding of the Spike protein of SARS-CoV-2 to the hACE2 receptor, and several promising allosteric modulators were discovered.« less
  3. For the rapidly growing aging demographic worldwide, robotic training methods could be impactful towards improving balance critical for everyday life. Here, we investigated the hypothesis that non-bodyweight supportive (nBWS) overground robotic balance training would lead to improvements in balance performance and balance confidence in older adults. Sixteen healthy older participants (69.7 ± 6.7 years old) were trained while donning a harness from a distinctive NaviGAITor robotic system. A control group of 11 healthy participants (68.7 ± 5.0 years old) underwent the same training but without the robotic system. Training included 6 weeks of standing and walking tasks while modifying: (1)more »sensory information (i.e., with and without vision (eyes-open/closed), with more and fewer support surface cues (hard or foam surfaces)) and (2) base-of-support (wide, tandem and single-leg standing exercises). Prior to and post-training, balance ability and balance confidence were assessed via the balance error scoring system (BESS) and the Activities specific Balance Confidence (ABC) scale, respectively. Encouragingly, results showed that balance ability improved (i.e., BESS errors significantly decreased), particularly in the nBWS group, across nearly all test conditions. This result serves as an indication that robotic training has an impact on improving balance for healthy aging individuals.« less
  4. ABSTRACT Soil bacteria adapt to diverse and rapidly changing environmental conditions by sensing and responding to environmental cues using a variety of sensory systems. Two-component systems are a widespread type of signal transduction system present in all three domains of life and typically are comprised of a sensor kinase and a response regulator. Many two-component systems function by regulating gene expression in response to environmental stimuli. The bacterial chemotaxis system is a modified two-component system with additional protein components and a response that, rather than regulating gene expression, involves behavioral adaptation and results in net movement toward or away frommore »a chemical stimulus. Soil bacteria generally have 20 to 40 or more chemoreceptors encoded in their genomes. To simplify the identification of chemoeffectors (ligands) sensed by bacterial chemoreceptors, we constructed hybrid sensor proteins by fusing the sensor domains of Pseudomonas putida chemoreceptors to the signaling domains of the Escherichia coli NarX/NarQ nitrate sensors. Responses to potential attractants were monitored by β-galactosidase assays using an E. coli reporter strain in which the nitrate-responsive narG promoter was fused to lacZ . Hybrid receptors constructed from PcaY, McfR, and NahY, which are chemoreceptors for aromatic acids, tricarboxylic acid cycle intermediates, and naphthalene, respectively, were sensitive and specific for detecting known attractants, and the β-galactosidase activities measured in E. coli correlated well with results of chemotaxis assays in the native P. putida strain. In addition, a screen of the hybrid receptors successfully identified new ligands for chemoreceptor proteins and resulted in the identification of six receptors that detect propionate. IMPORTANCE Relatively few of the thousands of chemoreceptors encoded in bacterial genomes have been functionally characterized. More importantly, although methyl-accepting chemotaxis proteins, the major type of chemoreceptors present in bacteria, are easily identified bioinformatically, it is not currently possible to predict what chemicals will bind to a particular chemoreceptor. Chemotaxis is known to play roles in biodegradation as well as in host-pathogen and host-symbiont interactions, but many studies are currently limited by the inability to identify relevant chemoreceptor ligands. The use of hybrid receptors and this simple E. coli reporter system allowed rapid and sensitive screening for potential chemoeffectors. The fusion site chosen for this study resulted in a high percentage of functional hybrids, indicating that it could be used to broadly test chemoreceptor responses from phylogenetically diverse samples. Considering the wide range of chemical attractants detected by soil bacteria, hybrid receptors may also be useful as sensitive biosensors.« less
  5. With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validatedmore »their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC 50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC 50 280 nM, corresponding to K i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.« less