skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Machine learning classification can reduce false positives in structure-based virtual screening
With the recent explosion in the size of libraries available for screening, virtual screening is positioned to assume a more prominent role in early drug discovery’s search for active chemical matter. In typical virtual screens, however, only about 12% of the top-scoring compounds actually show activity when tested in biochemical assays. We argue that most scoring functions used for this task have been developed with insufficient thoughtfulness into the datasets on which they are trained and tested, leading to overly simplistic models and/or overtraining. These problems are compounded in the literature because studies reporting new scoring methods have not validated their models prospectively within the same study. Here, we report a strategy for building a training dataset (D-COID) that aims to generate highly compelling decoy complexes that are individually matched to available active complexes. Using this dataset, we train a general-purpose classifier for virtual screening (vScreenML) that is built on the XGBoost framework. In retrospective benchmarks, our classifier shows outstanding performance relative to other scoring functions. In a prospective context, nearly all candidate inhibitors from a screen against acetylcholinesterase show detectable activity; beyond this, 10 of 23 compounds have IC 50 better than 50 μM. Without any medicinal chemistry optimization, the most potent hit has IC 50 280 nM, corresponding to K i of 173 nM. These results support using the D-COID strategy for training classifiers in other computational biology tasks, and for vScreenML in virtual screening campaigns against other protein targets. Both D-COID and vScreenML are freely distributed to facilitate such efforts.  more » « less
Award ID(s):
1836950
PAR ID:
10207648
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
117
Issue:
31
ISSN:
0027-8424
Page Range / eLocation ID:
18477 to 18488
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Melanoma and nonmelanoma skin cancers are among the most prevalent and most lethal forms of skin cancers. To identify new lead compounds with potential anticancer properties for further optimization, in vitro assays combined with in‐silico target fishing and docking have been used to identify and further map out the antiproliferative and potential mode of action of molecules from a small library of compounds previously prepared in our laboratory. From screening these compounds in vitro against A375, SK‐MEL‐28, A431, and SCC‐12 skin cancer cell lines, 35 displayed antiproliferative activities at the micromolar level, with the majority being primarily potent against the A431 and SCC‐12 squamous carcinoma cell lines. The most active compounds11(A431: IC50 = 5.0 μM, SCC‐12: IC50 = 2.9 μM, SKMEL‐28: IC50 = 4.9 μM, A375: IC50 = 6.7 μM) and13(A431: IC50 = 5.0 μM, SCC‐12: IC50 = 3.3 μM, SKMEL‐28: IC50 = 13.8 μM, A375: IC50 = 17.1 μM), significantly and dose‐dependently induced apoptosis of SCC‐12 and SK‐MEL‐28 cells, as evidenced by the suppression of Bcl‐2 and upregulation of Bax, cleaved caspase‐3, caspase‐9, and PARP protein expression levels. Both agents significantly reduced scratch wound healing, colony formation, and expression levels of deregulated cancer molecular targets including RSK/Akt/ERK1/2 and S6K1. In silico target prediction and docking studies using the SwissTargetPrediction web‐based tool suggested that CDK8, CLK4, nuclear receptor ROR, tyrosine protein‐kinase Fyn/LCK, ROCK1/2, and PARP, all of which are dysregulated in skin cancers, might be prospective targets for the two most active compounds. Further validation of these targets by western blot analyses, revealed that ROCK/Fyn and its associated Hedgehog (Hh) pathways were downregulated or modulated by the two lead compounds. In aggregate, these results provide a strong framework for further validation of the observed activities and the development of a more comprehensive structure–activity relationship through the preparation and biological evaluation of analogs. 
    more » « less
  2. null (Ed.)
    Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost. 
    more » « less
  3. null (Ed.)
    Electrophilic fluorine-mediated dearomative spirocyclization has been developed to synthesize a range of fluoro-substituted spiro-isoxazoline ethers and lactones. The in vitro biological assays of synthesized compounds were probed for anti-viral activity against human cytomegalovirus (HCMV) and cytotoxicity against glioblastomas (GBM6) and triple negative breast cancer (MDA MB 231). Interestingly, compounds 4d and 4n showed significant activity against HCMV (IC 50 ∼ 10 μM), while 4l and 5f revealed the highest cytotoxicity with IC 50 = 36 to 80 μM. The synthetic efficacy and biological relevance offer an opportunity to further drug-discovery development of fluoro-spiro-isoxazolines as novel anti-viral and anti-cancer agents. 
    more » « less
  4. Three triorganotin (IV) cyclopentane carboxylates were synthesized and structurally characterized by in solid state by Fourier‐transform infrared spectroscopy and single crystal diffraction, and in solution by NMR (1H,13C, and119Sn) spectroscopy. The complexes were tested for their anticancer activity against MCF‐7 and HeLa cells along with normal BHK‐21 cells. As revealed by MTT assay, complex2was identified as the most potent derivative with an IC50value of 2.59 and 0.051 μM against HeLa and MCF‐7 cells, respectively. The results were compared with cisplatin as reference drug. Fluorescent microscopic studies using 4′,6‐diamidino‐2‐phenylindole (DAPI) and propidium iodide (PI) staining confirmed the occurrence of apoptosis in HeLa cells treated with the most active complex2. The complex2also triggered the release of lactate dehydrogenase (LDH) in treated HeLa and MCF‐7 cells whereas a luminescence assay displayed a remarkable increase in the activity of caspase‐9 and ‐3. Moreover, flow cytometric results revealed that complex2caused G0/G1 arrest in the treated HeLa cells. The complexes were further screened for DNA binding studies through UV‐vis spectroscopy and cyclic voltammetry. The high activity of complex2was attributed to its higher Lewis acidity as indicated by natural bond orbital (NBO) analysis. Theoretical modelling and molecular docking studies were also conducted to study the reactivity of complexes againstVEGFR 2 Kinase. 
    more » « less
  5. Virtual screening is a cost- and time-effective alternative to traditional high-throughput screening in the drug discovery process. Both virtual screening approaches, structure-based molecular docking and ligand-based cheminformatics, suffer from computational cost, low accuracy, and/or reliance on prior knowledge of a ligand that binds to a given target. Here, we propose a neural network framework, NeuralDock, which accelerates the process of high-quality computational docking by a factor of 10 6 , and does not require prior knowledge of a ligand that binds to a given target. By approximating both protein-small molecule conformational sampling and energy-based scoring, NeuralDock accurately predicts the binding energy, and affinity of a protein-small molecule pair, based on protein pocket 3D structure and small molecule topology. We use NeuralDock and 25 GPUs to dock 937 million molecules from the ZINC database against superoxide dismutase-1 in 21 h, which we validate with physical docking using MedusaDock. Due to its speed and accuracy, NeuralDock may be useful in brute-force virtual screening of massive chemical libraries and training of generative drug models. 
    more » « less