skip to main content

Title: KGML-xDTD: a knowledge graph–based machine learning framework for drug treatment prediction and mechanism description
Abstract Background

Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings.


In this work, we propose KGML-xDTD: a Knowledge Graph–based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a 2-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable MOAs. We leverage knowledge-and-publication–based information to extract biologically meaningful “demonstration paths” as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths.


KGML-xDTD is the first model framework that can offer KG path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce “black-box” concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations and further accelerate the process of drug discovery for emerging diseases.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study builds a coronavirus knowledge graph (KG) by merging two information sources. The first source is Analytical Graph (AG), which integrates more than 20 different public datasets related to drug discovery. The second source is CORD-19, a collection of published scientific articles related to COVID-19. We combined both chemo genomic entities in AG with entities extracted from CORD-19 to expand knowledge in the COVID-19 domain. Before populating KG with those entities, we perform entity disambiguation on CORD-19 collections using Wikidata. Our newly built KG contains at least 21,700 genes, 2500 diseases, 94,000 phenotypes, and other biological entities (e.g., compound, species, and cell lines). We define 27 relationship types and use them to label each edge in our KG. This research presents two cases to evaluate the KG’s usability: analyzing a subgraph (ego-centered network) from the angiotensin-converting enzyme (ACE) and revealing paths between biological entities (hydroxychloroquine and IL-6 receptor; chloroquine and STAT1). The ego-centered network captured information related to COVID-19. We also found significant COVID-19-related information in top-ranked paths with a depth of three based on our path evaluation. 
    more » « less
  2. Abstract

    The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein–protein interaction network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5 and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.

    more » « less
  3. Similar molecular and genetic aberrations among diseases can lead to the discovery of jointly important treatment options across biologically similar diseases. Oncologists closely looked at several hormone-dependent cancers and identified remarkable pathological and molecular similarities in their DNA repair pathway abnormalities. Although deficiencies in Homologous Recombination (HR) pathway plays a significant role towards cancer progression, there could be other DNA-repair pathway deficiencies that requires careful investigation. In this paper, through a biomarker-driven drug repurposing model, we identified several potential drug candidates for breast and prostate cancer patients with DNA-repair deficiencies based on common specific biomarkers and irrespective of the organ the tumors originated from. Normalized discounted cumulative gain (NDCG) and sensitivity analysis were used to assess the performance of the drug repurposing model. Our results showed that Mitoxantrone and Genistein were among drugs with high therapeutic effects that significantly reverted the gene expression changes caused by the disease (FDR adjusted p-values for prostate cancer =1.225e-4 and 8.195e-8, respectively) for patients with deficiencies in their homologous recombination (HR) pathways. The proposed multi-cancer treatment framework, suitable for patients whose cancers had common specific biomarkers, has the potential to identify promising drug candidates by enriching the study population through the integration of multiple cancers and targeting patients who respond poorly to organ-specific treatments.

    more » « less
  4. Abstract Background

    Autosomal dominant polycystic kidney disease (ADPKD) is one of the most prevalent monogenic human diseases. It is mostly caused by pathogenic variants inPKD1orPKD2genes that encode interacting transmembrane proteins polycystin-1 (PC1) and polycystin-2 (PC2). Among many pathogenic processes described in ADPKD, those associated with cAMP signaling, inflammation, and metabolic reprogramming appear to regulate the disease manifestations. Tolvaptan, a vasopressin receptor-2 antagonist that regulates cAMP pathway, is the only FDA-approved ADPKD therapeutic. Tolvaptan reduces renal cyst growth and kidney function loss, but it is not tolerated by many patients and is associated with idiosyncratic liver toxicity. Therefore, additional therapeutic options for ADPKD treatment are needed.


    As drug repurposing of FDA-approved drug candidates can significantly decrease the time and cost associated with traditional drug discovery, we used the computational approach signature reversion to detect inversely related drug response gene expression signatures from the Library of Integrated Network-Based Cellular Signatures (LINCS) database and identified compounds predicted to reverse disease-associated transcriptomic signatures in three publicly availablePkd2kidney transcriptomic data sets of mouse ADPKD models. We focused on a pre-cystic model for signature reversion, as it was less impacted by confounding secondary disease mechanisms in ADPKD, and then compared the resulting candidates’ target differential expression in the two cystic mouse models. We further prioritized these drug candidates based on their known mechanism of action, FDA status, targets, and by functional enrichment analysis.


    With this in-silico approach, we prioritized 29 unique drug targets differentially expressed inPkd2ADPKD cystic models and 16 prioritized drug repurposing candidates that target them, including bromocriptine and mirtazapine, which can be further tested in-vitro and in-vivo.


    Collectively, these results indicate drug targets and repurposing candidates that may effectively treat pre-cystic as well as cystic ADPKD.

    Graphical Abstract 
    more » « less
  5. Abstract Motivation

    Food-derived bioactive peptides (FBPs) have demonstrated their significance in pharmaceuticals, diets and nutraceuticals, benefiting public health and global ecology. While significant efforts have been made to discover FBPs and to elucidate the underlying bioactivity mechanisms, there is lack of a systemic study of sequence–structure–activity relationship of FBPs in a large dataset.


    Here, we construct a database of food-derived bioactive peptides (DFBP), containing a total of 6276 peptide entries in 31 types from different sources. Further, we develop a series of analysis tools for function discovery/repurposing, traceability, multifunctional bioactive exploration and physiochemical property assessment of peptides. Finally, we apply this database and data-mining techniques to discover new FBPs as potential drugs for cardiovascular diseases. The DFBP serves as a useful platform for not only the fundamental understanding of sequence–structure–activity of FBPs but also the design, discovery, and repurposing of peptide-based drugs, vaccines, materials and food ingredients.

    Availability and implementation

    DFBP service can be accessed freely via All data are incorporated into the article and its online supplementary material.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less