skip to main content

Title: Comprehensive evaluation of deep and graph learning on drug–drug interactions prediction

Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug–drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, natural language processing based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.

more » « less
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Properties of molecules are indicative of their functions and thus are useful in many applications. With the advances of deep-learning methods, computational approaches for predicting molecular properties are gaining increasing momentum. However, there lacks customized and advanced methods and comprehensive tools for this task currently.


    Here, we develop a suite of comprehensive machine-learning methods and tools spanning different computational models, molecular representations and loss functions for molecular property prediction and drug discovery. Specifically, we represent molecules as both graphs and sequences. Built on these representations, we develop novel deep models for learning from molecular graphs and sequences. In order to learn effectively from highly imbalanced datasets, we develop advanced loss functions that optimize areas under precision–recall curves (PRCs) and receiver operating characteristic (ROC) curves. Altogether, our work not only serves as a comprehensive tool, but also contributes toward developing novel and advanced graph and sequence-learning methodologies. Results on both online and offline antibiotics discovery and molecular property prediction tasks show that our methods achieve consistent improvements over prior methods. In particular, our methods achieve #1 ranking in terms of both ROC-AUC (area under curve) and PRC-AUC on the AI Cures open challenge for drug discovery related to COVID-19.

    Availability and implementation

    Our source code is released as part of the MoleculeX library ( under AdvProp.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  2. Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at:, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  3. Abstract Background

    Computational drug repurposing is a cost- and time-efficient approach that aims to identify new therapeutic targets or diseases (indications) of existing drugs/compounds. It is especially critical for emerging and/or orphan diseases due to its cheaper investment and shorter research cycle compared with traditional wet-lab drug discovery approaches. However, the underlying mechanisms of action (MOAs) between repurposed drugs and their target diseases remain largely unknown, which is still a main obstacle for computational drug repurposing methods to be widely adopted in clinical settings.


    In this work, we propose KGML-xDTD: a Knowledge Graph–based Machine Learning framework for explainably predicting Drugs Treating Diseases. It is a 2-module framework that not only predicts the treatment probabilities between drugs/compounds and diseases but also biologically explains them via knowledge graph (KG) path-based, testable MOAs. We leverage knowledge-and-publication–based information to extract biologically meaningful “demonstration paths” as the intermediate guidance in the Graph-based Reinforcement Learning (GRL) path-finding process. Comprehensive experiments and case study analyses show that the proposed framework can achieve state-of-the-art performance in both predictions of drug repurposing and recapitulation of human-curated drug MOA paths.


    KGML-xDTD is the first model framework that can offer KG path explanations for drug repurposing predictions by leveraging the combination of prediction outcomes and existing biological knowledge and publications. We believe it can effectively reduce “black-box” concerns and increase prediction confidence for drug repurposing based on predicted path-based explanations and further accelerate the process of drug discovery for emerging diseases.

    more » « less
  4. null (Ed.)
    Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets. 
    more » « less
  5. With increasing trend of polypharmacy, drug‐drug interaction (DDI)‐induced adverse drug events (ADEs) are considered as a major challenge for clinical practice. As premarketing clinical trials usually have stringent inclusion/exclusion criteria, limited comedication data capture and often times small sample size have limited values in study DDIs. On the other hand, ADE reports collected by spontaneous reporting system (SRS) become an important source for DDI studies. There are two major challenges in detecting DDI signals from SRS: confounding bias and false positive rate. In this article, we propose a novel approach, propensity score‐adjusted three‐component mixture model (PS‐3CMM). This model can simultaneously adjust for confounding bias and estimate false discovery rate for all drug‐drug‐ADE combinations in FDA Adverse Event Reporting System (FAERS), which is a preeminent SRS database. In simulation studies, PS‐3CMM performs better in detecting true DDIs comparing to the existing approach. It is more sensitive in selecting the DDI signals that have nonpositive individual drug relative ADE risk (NPIRR). The application of PS‐3CMM is illustrated in analyzing the FAERS database. Compared to the existing approaches, PS‐3CMM prioritizes DDI signals differently. PS‐3CMM gives high priorities to DDI signals that have NPIRR. Both simulation studies and FAERS data analysis conclude that our new PS‐3CMM is a new method that is complement to the existing DDI signal detection methods.

    more » « less