The lack of biologically relevant protein structures can hinder rational design of small molecules to target G protein-coupled receptors (GPCRs). While ensemble docking using multiple models of the protein target is a promising technique for structure-based drug discovery, model clustering and selection still need further investigations to achieve both high accuracy and efficiency. In this work, we have developed an original ensemble docking approach, which identifies the most relevant conformations based on the essential dynamics of the protein pocket. This approach is applied to the study of small-molecule antagonists for the PAC1 receptor, a class B GPCR and a regulator of stress. As few as four representative PAC1 models are selected from simulations of a homology model and then used to screen three million compounds from the ZINC database and 23 experimentally validated compounds for PAC1 targeting. Our essential dynamics ensemble docking (EDED) approach can effectively reduce the number of false negatives in virtual screening and improve the accuracy to seek potent compounds. Given the cost and difficulties to determine membrane protein structures for all the relevant states, our methodology can be useful for future discovery of small molecules to target more other GPCRs, either with or without experimental structures.
more »
« less
Bayes Optimal Informer Sets for Early-Stage Drug Discovery
Abstract An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer-based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anticancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance.
more »
« less
- Award ID(s):
- 2023239
- PAR ID:
- 10364241
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 79
- Issue:
- 2
- ISSN:
- 0006-341X
- Format(s):
- Medium: X Size: p. 642-654
- Size(s):
- p. 642-654
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein–protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.more » « less
-
null (Ed.)Abstract In this study, we developed a novel algorithm to improve the screening performance of an arbitrary docking scoring function by recalibrating the docking score of a query compound based on its structure similarity with a set of training compounds, while the extra computational cost is neglectable. Two popular docking methods, Glide and AutoDock Vina were adopted as the original scoring functions to be processed with our new algorithm and similar improvement performance was achieved. Predicted binding affinities were compared against experimental data from ChEMBL and DUD-E databases. 11 representative drug receptors from diverse drug target categories were applied to evaluate the hybrid scoring function. The effects of four different fingerprints (FP2, FP3, FP4, and MACCS) and the four different compound similarity effect (CSE) functions were explored. Encouragingly, the screening performance was significantly improved for all 11 drug targets especially when CSE = S 4 (S is the Tanimoto structural similarity) and FP2 fingerprint were applied. The average predictive index (PI) values increased from 0.34 to 0.66 and 0.39 to 0.71 for the Glide and AutoDock vina scoring functions, respectively. To evaluate the performance of the calibration algorithm in drug lead identification, we also imposed an upper limit on the structural similarity to mimic the real scenario of screening diverse libraries for which query ligands are general-purpose screening compounds and they are not necessarily structurally similar to reference ligands. Encouragingly, we found our hybrid scoring function still outperformed the original docking scoring function. The hybrid scoring function was further evaluated using external datasets for two systems and we found the PI values increased from 0.24 to 0.46 and 0.14 to 0.42 for A2AR and CFX systems, respectively. In a conclusion, our calibration algorithm can significantly improve the virtual screening performance in both drug lead optimization and identification phases with neglectable computational cost.more » « less
-
Abstract MotivationAccurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks. ResultsInspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning. Availability and implementationThe source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
-
Abstract MotivationAccurately predicting the likelihood of interaction between two objects (compound–protein sequence, user–item, author–paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. ResultsWe present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound–protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug–protein interaction prediction), metabolic engineering, and synthetic biology (compound–enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug–target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. Availability and implementationCode and dataset available at https://github.com/HassounLab/CSI.more » « less
An official website of the United States government
