skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An artificial intelligence accelerated virtual screening platform for drug discovery
Abstract Structure-based virtual screening is a key tool in early drug discovery, with growing interest in the screening of multi-billion chemical compound libraries. However, the success of virtual screening crucially depends on the accuracy of the binding pose and binding affinity predicted by computational docking. Here we develop a highly accurate structure-based virtual screen method, RosettaVS, for predicting docking poses and binding affinities. Our approach outperforms other state-of-the-art methods on a wide range of benchmarks, partially due to our ability to model receptor flexibility. We incorporate this into a new open-source artificial intelligence accelerated virtual screening platform for drug discovery. Using this platform, we screen multi-billion compound libraries against two unrelated targets, a ubiquitin ligase target KLHDC2 and the human voltage-gated sodium channel NaV1.7. For both targets, we discover hit compounds, including seven hits (14% hit rate) to KLHDC2 and four hits (44% hit rate) to NaV1.7, all with single digit micromolar binding affinities. Screening in both cases is completed in less than seven days. Finally, a high resolution X-ray crystallographic structure validates the predicted docking pose for the KLHDC2 ligand complex, demonstrating the effectiveness of our method in lead discovery.  more » « less
Award ID(s):
2203513
PAR ID:
10540087
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
15
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost. 
    more » « less
  2. null (Ed.)
    As fragment-based drug discovery has become mainstream, there has been an increase in various screening methodologies. Protein-observed 19F (PrOF) NMR and 1H CPMG NMR are two fragment screening assays that have complementary advantages. Here, we sought to combine these two NMR-based assays into a new screening workflow. This combination of protein- and ligand-observed experiments allows for a time- and resource-efficient multiplexed screen of mixtures of fragments and proteins. PrOF NMR is first used to screen mixtures against two proteins. Hit mixtures for each protein are identified then deconvoluted using 1H CPMG NMR. We demonstrate the benefit of this fragment screening method by conducting the first reported fragment screens against the bromodomains of BPTF and Plasmodium falciparum (Pf) GCN5 using 467 3D-enriched fragments. The hit rates were 6%, 5% and 4% for fragments binding BPTF, PfGCN5, and fragments binding both proteins, respectively. Select hits were characterized, revealing a broad range of affinities from low µM to mM dissociation constants. Follow-up experiments supported a low-affinity second binding site on PfGCN5. This approach can be used to bias fragment screens towards more selective hits at the onset of inhibitor development in a resource- and time-efficient manner. 
    more » « less
  3. The CACHE challenges are a series of prospective benchmarking exercises to evaluate progress in the field of computational hit-finding. Here we report the results of the inaugural CACHE challenge in which 23 computational teams each selected up to 100 commercially available compounds that they predicted would bind to the WDR domain of the Parkinson’s disease target LRRK2, a domain with no known ligand and only an apo structure in the PDB. The lack of known binding data and presumably low druggability of the target is a challenge to computational hit finding methods. Of the 1955 molecules predicted by participants in Round 1 of the challenge, 73 were found to bind to LRRK2 in an SPR assay with a KD lower than 150 μM. These 73 molecules were advanced to the Round 2 hit expansion phase, where computational teams each selected up to 50 analogs. Binding was observed in two orthogonal assays for seven chemically diverse series, with affinities ranging from 18 to 140 μM. The seven successful computational workflows varied in their screening strategies and techniques. Three used molecular dynamics to produce a conformational ensemble of the targeted site, three included a fragment docking step, three implemented a generative design strategy and five used one or more deep learning steps. CACHE #1 reflects a highly exploratory phase in computational drug design where participants adopted strikingly diverging screening strategies. Machine learning-accelerated methods achieved similar results to brute force (e.g., exhaustive) docking. First-in-class, experimentally confirmed compounds were rare and weakly potent, indicating that recent advances are not sufficient to effectively address challenging targets. 
    more » « less
  4. null (Ed.)
    The novel coronavirus disease 19 (Covid-19) which is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been a pandemic across the world, which necessitate the need for the antiviral drug discovery. One of the potential protein targets for coronavirus treatment is RNA-dependent RNA polymerase. It is the key enzyme in the viral replication machinery, and it does not exist in human beings, therefore its targeting has been considered as a strategic approach. Here we describe the identification of potential hits from Indonesian Herbal and ZINC databases. The pharmacophore modeling was employed followed by molecular docking and dynamics simulation for 40 ns. 151 and 14480 hit molecules were retrieved from Indonesian herbal and ZINC databases, respectively. Three hits that were selected based on the structural analysis were stable during 40 ns, while binding energy prediction further implied that ZINC1529045114, ZINC169730811, and 9-Ribosyl-trans-zeatin had tighter binding affinities compared to Remdesivir. The ZINC169730811 had the strongest affinity toward RdRp compared to the other two hits including Remdesivir and its binding was corroborated by electrostatic, van der Waals, and nonpolar contribution for solvation energies. The present study offers three hits showing tighter binding to RdRp based on MM-PBSA binding energy prediction for further experimental verification. 
    more » « less
  5. Virtual screening is a cost- and time-effective alternative to traditional high-throughput screening in the drug discovery process. Both virtual screening approaches, structure-based molecular docking and ligand-based cheminformatics, suffer from computational cost, low accuracy, and/or reliance on prior knowledge of a ligand that binds to a given target. Here, we propose a neural network framework, NeuralDock, which accelerates the process of high-quality computational docking by a factor of 10 6 , and does not require prior knowledge of a ligand that binds to a given target. By approximating both protein-small molecule conformational sampling and energy-based scoring, NeuralDock accurately predicts the binding energy, and affinity of a protein-small molecule pair, based on protein pocket 3D structure and small molecule topology. We use NeuralDock and 25 GPUs to dock 937 million molecules from the ZINC database against superoxide dismutase-1 in 21 h, which we validate with physical docking using MedusaDock. Due to its speed and accuracy, NeuralDock may be useful in brute-force virtual screening of massive chemical libraries and training of generative drug models. 
    more » « less