skip to main content


Title: GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data
Abstract Traditional techniques to identify macromolecular targets for drugs utilize solely the information on a query drug and a putative target. Nonetheless, the mechanisms of action of many drugs depend not only on their binding affinity toward a single protein, but also on the signal transduction through cascades of molecular interactions leading to certain phenotypes. Although using protein-protein interaction networks and drug-perturbed gene expression profiles can facilitate system-level investigations of drug-target interactions, utilizing such large and heterogeneous data poses notable challenges. To improve the state-of-the-art in drug target identification, we developed GraphDTI, a robust machine learning framework integrating the molecular-level information on drugs, proteins, and binding sites with the system-level information on gene expression and protein-protein interactions. In order to properly evaluate the performance of GraphDTI, we compiled a high-quality benchmarking dataset and devised a new cluster-based cross-validation protocol. Encouragingly, GraphDTI not only yields an AUC of 0.996 against the validation dataset, but it also generalizes well to unseen data with an AUC of 0.939, significantly outperforming other predictors. Finally, selected examples of identified drugtarget interactions are validated against the biomedical literature. Numerous applications of GraphDTI include the investigation of drug polypharmacological effects, side effects through offtarget binding, and repositioning opportunities.  more » « less
Award ID(s):
1852454 2020446
NSF-PAR ID:
10321850
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Journal of Cheminformatics
Volume:
13
Issue:
1
ISSN:
1758-2946
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein–protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.

     
    more » « less
  2. There is an urgent need to repurpose drugs against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Recent computational-experimental screenings have identified several existing drugs that could serve as effective inhibitors of the virus’ main protease, M pro , which is involved in gene expression and replication. Among these, ebselen (2-phenyl-1,2-benzoselenazol-3-one) appears to be particularly promising. Here, we examine, at a molecular level, the potential of ebselen to decrease M pro activity. We find that it exhibits a distinct affinity for the catalytic region. Our results reveal a higher-affinity, previously unknown binding site localized between the II and III domains of the protein. A detailed strain analysis indicates that, on such a site, ebselen exerts a pronounced allosteric effect that regulates catalytic site access through surface-loop interactions, thereby inducing a reconfiguration of water hotspots. Together, these findings highlight the promise of ebselen as a repurposed drug against SARS-CoV-2. 
    more » « less
  3. null (Ed.)
    Abstract The ability to predict the efficacy of cancer treatments is a longstanding goal of precision medicine that requires improved understanding of molecular interactions with drugs and the discovery of biomarkers of drug response. Identifying genes whose expression influences drug sensitivity can help address both of these needs, elucidating the molecular pathways involved in drug efficacy and providing potential ways to predict new patients’ response to available therapies. In this study, we integrated cancer type, drug treatment, and survival data with RNA-seq gene expression data from The Cancer Genome Atlas to identify genes and gene sets whose expression levels in patient tumor biopsies are associated with drug-specific patient survival using a log-rank test comparing survival of patients with low vs. high expression for each gene. This analysis was successful in identifying thousands of such gene–drug relationships across 20 drugs in 14 cancers, several of which have been previously implicated in the respective drug’s efficacy. We then clustered significant genes based on their expression patterns across patients and defined gene sets that are more robust predictors of patient outcome, many of which were significantly enriched for target genes of one or more transcription factors, indicating several upstream regulatory mechanisms that may be involved in drug efficacy. We identified a large number of genes and gene sets that were potentially useful as transcript-level biomarkers for predicting drug-specific patient survival outcome. Our gene sets were robust predictors of drug-specific survival and our results included both novel and previously reported findings, suggesting that the drug-specific survival marker genes reported herein warrant further investigation for insights into drug mechanisms and for validation as biomarkers to aid cancer therapy decisions. 
    more » « less
  4. Skolnick, Jeffrey (Ed.)
    Systematically discovering protein-ligand interactions across the entire human and pathogen genomes is critical in chemical genomics, protein function prediction, drug discovery, and many other areas. However, more than 90% of gene families remain “dark”—i.e., their small-molecule ligands are undiscovered due to experimental limitations or human/historical biases. Existing computational approaches typically fail when the dark protein differs from those with known ligands. To address this challenge, we have developed a deep learning framework, called PortalCG, which consists of four novel components: (i) a 3-dimensional ligand binding site enhanced sequence pre-training strategy to encode the evolutionary links between ligand-binding sites across gene families; (ii) an end-to-end pretraining-fine-tuning strategy to reduce the impact of inaccuracy of predicted structures on function predictions by recognizing the sequence-structure-function paradigm; (iii) a new out-of-cluster meta-learning algorithm that extracts and accumulates information learned from predicting ligands of distinct gene families (meta-data) and applies the meta-data to a dark gene family; and (iv) a stress model selection step, using different gene families in the test data from those in the training and development data sets to facilitate model deployment in a real-world scenario. In extensive and rigorous benchmark experiments, PortalCG considerably outperformed state-of-the-art techniques of machine learning and protein-ligand docking when applied to dark gene families, and demonstrated its generalization power for target identifications and compound screenings under out-of-distribution (OOD) scenarios. Furthermore, in an external validation for the multi-target compound screening, the performance of PortalCG surpassed the rational design from medicinal chemists. Our results also suggest that a differentiable sequence-structure-function deep learning framework, where protein structural information serves as an intermediate layer, could be superior to conventional methodology where predicted protein structures were used for the compound screening. We applied PortalCG to two case studies to exemplify its potential in drug discovery: designing selective dual-antagonists of dopamine receptors for the treatment of opioid use disorder (OUD), and illuminating the understudied human genome for target diseases that do not yet have effective and safe therapeutics. Our results suggested that PortalCG is a viable solution to the OOD problem in exploring understudied regions of protein functional space. 
    more » « less
  5. Abstract Motivation

    Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.

    Results

    Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.

    Availability and implementation

    The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less