skip to main content

This content will become publicly available on December 1, 2022

Title: Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities
Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity more » to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost. « less
; ; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
Journal of Cheminformatics
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased aboutmore »225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs.« less
  2. While the COVID-19 pandemic continues to worsen, effective medicines that target the life cycle of SARS-CoV-2 are still under development. As more highly infective and dangerous variants of the coronavirus emerge, the protective power of vaccines will decrease or vanish. Thus, the development of drugs, which are free of drug resistance is direly needed. The aim of this study is to identify allosteric binding modulators from a large compound library to inhibit the binding between the Spike protein of the SARS-CoV-2 virus and human angiotensin-converting enzyme 2 (hACE2). The binding of the Spike protein to hACE2 is the first step of the infection of host cells by the coronavirus. We first built a compound library containing 77 448 antiviral compounds. Molecular docking was then conducted to preliminarily screen compounds which can potently bind to the Spike protein at two allosteric binding sites. Next, molecular dynamics simulations were performed to accurately calculate the binding affinity between the spike protein and an identified compound from docking screening and to investigate whether the compound can interfere with the binding between the Spike protein and hACE2. We successfully identified two possible drug binding sites on the Spike protein and discovered a series of antiviral compoundsmore »which can weaken the interaction between the Spike protein and hACE2 receptor through conformational changes of the key Spike residues at the Spike–hACE2 binding interface induced by the binding of the ligand at the allosteric binding site. We also applied our screening protocol to another compound library which consists of 3407 compounds for which the inhibitory activities of Spike/hACE2 binding were measured. Encouragingly, in vitro data supports that the identified compounds can inhibit the Spike–ACE2 binding. Thus, we developed a promising computational protocol to discover allosteric inhibitors of the binding of the Spike protein of SARS-CoV-2 to the hACE2 receptor, and several promising allosteric modulators were discovered.« less
  3. Abstract

    RNA dependent RNA polymerase (RdRp), is an essential in the RNA replication within the life cycle of the severely acute respiratory coronavirus-2 (SARS-CoV-2), causing the deadly respiratory induced sickness COVID-19. Remdesivir is a prodrug that has seen some success in inhibiting this enzyme, however there is still the pressing need for effective alternatives. In this study, we present the discovery of four non-nucleoside small molecules that bind favorably to SARS-CoV-2 RdRp over the active form of the popular drug remdesivir (RTP) and adenosine triphosphate (ATP) by utilizing high-throughput virtual screening (HTVS) against the vast ZINC compound database coupled with extensive molecular dynamics (MD) simulations. After post-trajectory analysis, we found that the simulations of complexes containing both ATP and RTP remained stable for the duration of their trajectories. Additionally, it was revealed that the phosphate tail of RTP was stabilized by both the positive amino acid pocket and magnesium ions near the entry channel of RdRp which includes residues K551, R553, R555 and K621. It was also found that residues D623, D760, and N691 further stabilized the ribose portion of RTP with U10 on the template RNA strand forming hydrogen pairs with the adenosine motif. Using these models of RdRp,more »we employed them to screen the ZINC database of ~ 17 million molecules. Using docking and drug properties scoring, we narrowed down our selection to fourteen candidates. These were subjected to 200 ns simulations each underwent free energy calculations. We identified four hit compounds from the ZINC database that have similar binding poses to RTP while possessing lower overall binding free energies, with ZINC097971592 having a binding free energy two times lower than RTP.

    « less
  4. With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC 50 better than 50 μM. Without any medicinal chemistry optimization,more »the most potent hit has IC 50 280 nM, corresponding to K i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.« less
  5. Three new organotin( iv ) carboxylate compounds were synthesized and structurally characterized by elemental analysis and FT-IR and multinuclear NMR ( 1 H, 13 C, 119 Sn) spectroscopy. Single X-ray crystallography reveals that compound C2 has a monoclinic crystal system with space group P 2 1 / c having distorted bipyramidal geometry defined by C 3 SnO 2 . The synthesized compounds were screened for drug-DNA interactions via UV-Vis spectroscopy and cyclic voltammetry showing good activity with high binding constants. Theoretical investigations also support the reactivity of the compounds as depicted from natural bond orbital (NBO) analysis using Gaussian 09. Synthesized compounds were initially evaluated on two cancer (HeLa and MCF-7) cell lines and cytotoxicity to normal cells was evaluated using a non-cancerous (BHK-21) cell line. All the compounds were found to be active, with IC 50 values less than that of the standard drug i.e. cisplatin. The cytotoxic effect of the most potent compound C2 was confirmed by LDH cytotoxicity assay and fluorescence imaging after PI staining. Apoptotic features in compound C2 treated cancer cells were visualized after DAPI staining while regulation of apoptosis was observed by reactive oxygen species generation, binding of C2 with DNA, a change inmore »mitochondrial membrane potential and expression of activated caspase-9 and caspase-3 in cancer cells. Results are indicative of activation of the intrinsic pathway of apoptosis in C2 treated cancer cells.« less