skip to main content

Title: HINGRL: predicting drug–disease associations with graph representation learning on heterogeneous information networks

Identifying new indications for drugs plays an essential role at many phases of drug research and development. Computational methods are regarded as an effective way to associate drugs with new indications. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering the biological knowledge of drugs and diseases, which are believed to be useful for improving the accuracy of drug repositioning. To this end, a novel heterogeneous information network (HIN) based model, namely HINGRL, is proposed to precisely identify new indications for drugs based on graph representation learning techniques. More specifically, HINGRL first constructs a HIN by integrating drug–disease, drug–protein and protein–disease biological networks with the biological knowledge of drugs and diseases. Then, different representation strategies are applied to learn the features of nodes in the HIN from the topological and biological perspectives. Finally, HINGRL adopts a Random Forest classifier to predict unknown drug–disease associations based on the integrated features of drugs and diseases obtained in the previous step. Experimental results demonstrate that HINGRL achieves the best performance on two real datasets when compared with state-of-the-art models. Besides, our case studies indicate that the simultaneous consideration of network topology and biological knowledge of more » drugs and diseases allows HINGRL to precisely predict drug–disease associations from a more comprehensive perspective. The promising performance of HINGRL also reveals that the utilization of rich heterogeneous information provides an alternative view for HINGRL to identify novel drug–disease associations especially for new diseases.

« less
; ; ; ;
Publication Date:
Journal Name:
Briefings in Bioinformatics
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Genomic profiles of cancer cells provide valuable information on genetic alterations in cancer. Several recent studies employed these data to predict the response of cancer cell lines to drug treatment. Nonetheless, due to the multifactorial phenotypes and intricate mechanisms of cancer, the accurate prediction of the effect of pharmacotherapy on a specific cell line based on the genetic information alone is problematic. Emphasizing on the system-level complexity of cancer, we devised a procedure to integrate multiple heterogeneous data, including biological networks, genomics, inhibitor profiling, and gene-disease associations, into a unified graph structure. In order to construct compact, yet information-rich cancer-specific networks, we developed a novel graph reduction algorithm. Driven by not only the topological information, but also the biological knowledge, the graph reduction increases the feature-only entropy while preserving the valuable graph-feature information. Subsequent comparative benchmarking simulations employing a tissue level cross-validation protocol demonstrate that the accuracy of a graph-based predictor of the drug efficacy is 0.68, which is notably higher than those measured for more traditional, matrix-based techniques on the same data. Overall, the non-Euclidean representation of the cancer-specific data improves the performance of machine learning to predict the response of cancer to pharmacotherapy. The generated data aremore »freely available to the academic community at

    « less
  2. Off-label drug use refers to using marketed drugs for indications that are not listed in their FDA labeling information. Such uses are very common and sometimes inevitable in clinical practice. To some extent, off-label drug uses provide a pathway for clinical innovation, however, they could cause serious adverse effects due to lacking scientific research and tests. Since identifying the off-label uses can provide a clue to the stakeholders including healthcare providers, patients, and medication manufacturers to further the investigation on drug efficacy and safety, it raises the demand for a systematic way to detect off-label uses. Given data contributed by health consumers in online health communities (OHCs), we developed an automated approach to detect off-label drug uses based on heterogeneous network mining. We constructed a heterogeneous healthcare network with medical entities (e.g. disease, drug, adverse drug reaction) mined from the text corpus, which involved 50 diseases, 1,297 drugs, and 185 ADRs, and determined 13 meta paths between the drugs and diseases. We developed three metrics to represent the meta-path-based topological features. With the network features, we trained the binary classifiers built on Random Forest algorithm to recognize the known drug-disease associations. The best classification model that used lift to measuremore »path weights obtained F1-score of 0.87, based on which, we identified 1,009 candidates of off-label drug uses and examined their potential by searching evidence from PubMed and FAERS.« less
  3. Abstract Motivation

    Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.


    Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.

    Availability and implementation

    The source code and datamore »used in NeoDTI are available at:

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  4. Off-label drug use is quite common in clinical practice and inevitable to some extent. Such uses might deliver effective treatment and suggest clinical innovation sometimes, however, they have the unknown risk to cause serious outcomes due to lacking scientific support. As gaining information about off-label drug use could present a clue to the stakeholders such as healthcare professionals and medication manufacturers to further the investigation on drug efficacy and safety, it raises the need to develop a systematic way to detect off-label drug uses. Considering the increasing discussions in online health communities (OHCs) among the health consumers, we proposed to harness the large volume of timely information in OHCs to develop an automated method for detecting off-label drug uses from health consumer generated data. From the text corpus, we extracted medical entities (diseases, drugs, and adverse drug reactions) with lexicon-based approaches and measured their interactions with word embedding models, based on which, we constructed a heterogeneous healthcare network. We defined several meta-path-based indicators to describe the drug-disease associations in the heterogeneous network and used them as features to train a binary classifier built on Random Forest algorithm, to recognize the known drug-disease associations. The classification model obtained better results whenmore »incorporating word embedding features and achieved the best performance when using both association rule mining features and word embedding features, with F1-score reaching 0.939, based on which, we identified 2,125 possible off-label drug uses and checked their potential by searching evidence in PubMed and FAERS.« less
  5. Abstract

    Noncoding RNAs (ncRNAs) have recently attracted considerable attention due to their key roles in biology. The ncRNA–proteins interaction (NPI) is often explored to reveal some biological activities that ncRNA may affect, such as biological traits, diseases, etc. Traditional experimental methods can accomplish this work but are often labor-intensive and expensive. Machine learning and deep learning methods have achieved great success by exploiting sufficient sequence or structure information. Graph Neural Network (GNN)-based methods consider the topology in ncRNA–protein graphs and perform well on tasks like NPI prediction. Based on GNN, some pairwise constraint methods have been developed to apply on homogeneous networks, but not used for NPI prediction on heterogeneous networks. In this paper, we construct a pairwise constrained NPI predictor based on dual Graph Convolutional Network (GCN) called NPI-DGCN. To our knowledge, our method is the first to train a heterogeneous graph-based model using a pairwise learning strategy. Instead of binary classification, we use a rank layer to calculate the score of an ncRNA–protein pair. Moreover, our model is the first to predict NPIs on the ncRNA–protein bipartite graph rather than the homogeneous graph. We transform the original ncRNA–protein bipartite graph into two homogenous graphs on which to exploremore »second-order implicit relationships. At the same time, we model direct interactions between two homogenous graphs to explore explicit relationships. Experimental results on the four standard datasets indicate that our method achieves competitive performance with other state-of-the-art methods. And the model is available at

    « less