skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: Incorporating structural similarity into a scoring function to enhance the prediction of binding affinities
Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Cheminformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Monkeypox (now Mpox), a zoonotic disease caused by the monkeypox virus (MPXV) is an emerging threat to global health. In the time span of only six months, from May to October 2022, the number of MPXV cases breached 80,000 and many of the outbreaks occurred in locations that had never previously reported MPXV. Currently there are no FDA-approved MPXV-specific vaccines or treatments, therefore, finding drugs to combat MPXV is of utmost importance. The A42R profilin-like protein of the MPXV is involved in cell development and motility making it a critical drug target. A42R protein is highly conserved across orthopoxviruses, thus A42R inhibitors may work for other family members. This study sought to identify potential A42R inhibitors for MPXV treatment using computational approaches. The energy minimized 3D structure of the A42R profilin-like protein (PDB ID: 4QWO) underwent virtual screening using a library of 36,366 compounds from Traditional Chinese Medicine (TCM), AfroDb, and PubChem databases as well as known inhibitor tecovirimat via AutoDock Vina. A total of seven compounds comprising PubChem CID: 11371962, ZINC000000899909, ZINC000001632866, ZINC000015151344, ZINC000013378519, ZINC000000086470, and ZINC000095486204, predicted to have favorable binding were shortlisted. Molecular docking suggested that all seven proposed compounds have higher binding affinities to A42R (–7.2 to –8.3 kcal/mol) than tecovirimat (–6.7 kcal/mol). This was corroborated by MM/PBSA calculations, with tecovirimat demonstrating the highest binding free energy of –68.694 kJ/mol (lowest binding affinity) compared to the seven shortlisted compounds that ranged from –73.252 to –97.140 kJ/mol. Furthermore, the 7 compounds in complex with A42R demonstrated higher stability than the A42R-tecovirimat complex when subjected to 100 ns molecular dynamics simulations. The protein-ligand interaction maps generated using LigPlot+ suggested that residues Met1, Glu3, Trp4, Ile7, Arg127, Val128, Thr131, and Asn133 are important for binding. These seven compounds were adequately profiled to be potential antivirals via PASS predictions and structural similarity searches. All seven potential lead compounds were scored Pa > Pi for antiviral activity while ZINC000001632866 and ZINC000015151344 were predicted as poxvirus inhibitors with Pa values of 0.315 and 0.215, and Pi values of 0.052 and 0.136, respectively. Further experimental validations of the identified lead compounds are required to corroborate their predicted activity. These seven identified compounds represent solid footing for development of antivirals against MPXV and other orthopoxviruses.

    more » « less
  2. Metabotropic glutamate receptors (mGluRs) play an important role in regulating glutamate signal pathways, which are involved in neuropathy and periphery homeostasis. mGluR4, which belongs to Group III mGluRs, is most widely distributed in the periphery among all the mGluRs. It has been proved that the regulation of this receptor is involved in diabetes, colorectal carcinoma and many other diseases. However, the application of structure-based drug design to identify small molecules to regulate the mGluR4 receptor is limited due to the absence of a resolved mGluR4 protein structure. In this work, we first built a homology model of mGluR4 based on a crystal structure of mGluR8, and then conducted hierarchical virtual screening (HVS) to identify possible active ligands for mGluR4. The HVS protocol consists of three hierarchical filters including Glide docking, molecular dynamic (MD) simulation and binding free energy calculation. We successfully prioritized active ligands of mGluR4 from a set of screening compounds using HVS. The predicted active ligands based on binding affinities can almost cover all the experiment-determined active ligands, with only one ligand missed. The correlation between the measured and predicted binding affinities is significantly improved for the MM-PB/GBSA-WSAS methods compared to the Glide docking method. More importantly, we have identified hotspots for ligand binding, and we found that SER157 and GLY158 tend to contribute to the selectivity of mGluR4 ligands, while ALA154 and ALA155 could account for the ligand selectivity to mGluR8. We also recognized other 5 key residues that are critical for ligand potency. The difference of the binding profiles between mGluR4 and mGluR8 can guide us to develop more potent and selective modulators. Moreover, we evaluated the performance of IPSF, a novel type of scoring function trained by a machine learning algorithm on residue–ligand interaction profiles, in guiding drug lead optimization. The cross-validation root-mean-square errors (RMSEs) are much smaller than those by the endpoint methods, and the correlation coefficients are comparable to the best endpoint methods for both mGluRs. Thus, machine learning-based IPSF can be applied to guide lead optimization, albeit the total number of actives/inactives are not big, a typical scenario in drug discovery projects. 
    more » « less
  3. null (Ed.)
    Abstract Structure-based virtual screenings (SBVSs) play an important role in drug discovery projects. However, it is still a challenge to accurately predict the binding affinity of an arbitrary molecule binds to a drug target and prioritize top ligands from an SBVS. In this study, we developed a novel method, using ligand-residue interaction profiles (IPs) to construct machine learning (ML)-based prediction models, to significantly improve the screening performance in SBVSs. Such a kind of the prediction model is called an IP scoring function (IP-SF). We systematically investigated how to improve the performance of IP-SFs from many perspectives, including the sampling methods before interaction energy calculation and different ML algorithms. Using six drug targets with each having hundreds of known ligands, we conducted a critical evaluation on the developed IP-SFs. The IP-SFs employing a gradient boosting decision tree (GBDT) algorithm in conjunction with the MIN + GB simulation protocol achieved the best overall performance. Its scoring power, ranking power and screening power significantly outperformed the Glide SF. First, compared with Glide, the average values of mean absolute error and root mean square error of GBDT/MIN + GB decreased about 38 and 36%, respectively. Second, the mean values of squared correlation coefficient and predictive index increased about 225 and 73%, respectively. Third, more encouragingly, the average value of the areas under the curve of receiver operating characteristic for six targets by GBDT, 0.87, is significantly better than that by Glide, which is only 0.71. Thus, we expected IP-SFs to have broad and promising applications in SBVSs. 
    more » « less
  4. null (Ed.)
    Small molecules that bind the SARS-CoV-2 nonstructural protein 3 Mac1 domain in place of ADP-ribose could be useful as molecular probes or scaffolds for COVID-19 antiviral drug discovery because Mac1 has been linked to the ability of coronaviruses to evade cellular detection. A high-throughput assay based on differential scanning fluorimetry (DSF) was therefore optimized and used to identify possible Mac1 ligands in small libraries of drugs and drug-like compounds. Numerous promising compounds included nucleotides, steroids, β-lactams, and benzimidazoles. The main drawback to this approach was that a high percentage of compounds in some libraries were found to influence the observed Mac1 melting temperature. To prioritize DSF screening hits, the shapes of the observed melting curves and initial assay fluorescence were examined, and the results were compared with virtual screens performed using AutoDock Vina. The molecular basis for alternate ligand binding was also examined by determining a structure of one of the hits, cyclic adenosine monophosphate, with atomic resolution. 
    more » « less
  5. While the COVID-19 pandemic continues to worsen, effective medicines that target the life cycle of SARS-CoV-2 are still under development. As more highly infective and dangerous variants of the coronavirus emerge, the protective power of vaccines will decrease or vanish. Thus, the development of drugs, which are free of drug resistance is direly needed. The aim of this study is to identify allosteric binding modulators from a large compound library to inhibit the binding between the Spike protein of the SARS-CoV-2 virus and human angiotensin-converting enzyme 2 (hACE2). The binding of the Spike protein to hACE2 is the first step of the infection of host cells by the coronavirus. We first built a compound library containing 77 448 antiviral compounds. Molecular docking was then conducted to preliminarily screen compounds which can potently bind to the Spike protein at two allosteric binding sites. Next, molecular dynamics simulations were performed to accurately calculate the binding affinity between the spike protein and an identified compound from docking screening and to investigate whether the compound can interfere with the binding between the Spike protein and hACE2. We successfully identified two possible drug binding sites on the Spike protein and discovered a series of antiviral compounds which can weaken the interaction between the Spike protein and hACE2 receptor through conformational changes of the key Spike residues at the Spike–hACE2 binding interface induced by the binding of the ligand at the allosteric binding site. We also applied our screening protocol to another compound library which consists of 3407 compounds for which the inhibitory activities of Spike/hACE2 binding were measured. Encouragingly, in vitro data supports that the identified compounds can inhibit the Spike–ACE2 binding. Thus, we developed a promising computational protocol to discover allosteric inhibitors of the binding of the Spike protein of SARS-CoV-2 to the hACE2 receptor, and several promising allosteric modulators were discovered. 
    more » « less