skip to main content

Title: Mining heterogeneous network for drug repositioning using phenotypic information extracted from social media and pharmaceutical databases
Drug repositioning has drawn significant attention for drug development in pharmaceutical research and industry, because of its advantages in cost and time compared with the de novo drug development. The availability of biomedical databases and online health-related information, as well as the high-performance computing, empowers the development of computational drug repositioning methods. In this work, we developed a systematic approach that identifies repositioning drugs based on heterogeneous network mining using both pharmaceutical databases (PharmGKB and SIDER) and online health community (MedHelp). By utilizing adverse drug reactions (ADRs) as the intermediate, we constructed a heterogeneous health network containing drugs, diseases, and ADRs, and developed path-based heterogeneous network mining approaches for drug repositioning. Additionally, we investigated on how the data sources affect the performance on drug repositioning. Experiment results showed that combining both PharmKGB and MedHelp identified 479 repositioning drugs, which are more than the repositioning drugs discovered by other alternatives. In addition, 31% of the 479 of the discovered repositioning drugs were supported by evidence from PubMed.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Artificial intelligence in medicine
Page Range / eLocation ID:
80 - 92
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Off-label drug use refers to using marketed drugs for indications that are not listed in their FDA labeling information. Such uses are very common and sometimes inevitable in clinical practice. To some extent, off-label drug uses provide a pathway for clinical innovation, however, they could cause serious adverse effects due to lacking scientific research and tests. Since identifying the off-label uses can provide a clue to the stakeholders including healthcare providers, patients, and medication manufacturers to further the investigation on drug efficacy and safety, it raises the demand for a systematic way to detect off-label uses. Given data contributed by health consumers in online health communities (OHCs), we developed an automated approach to detect off-label drug uses based on heterogeneous network mining. We constructed a heterogeneous healthcare network with medical entities (e.g. disease, drug, adverse drug reaction) mined from the text corpus, which involved 50 diseases, 1,297 drugs, and 185 ADRs, and determined 13 meta paths between the drugs and diseases. We developed three metrics to represent the meta-path-based topological features. With the network features, we trained the binary classifiers built on Random Forest algorithm to recognize the known drug-disease associations. The best classification model that used lift to measure path weights obtained F1-score of 0.87, based on which, we identified 1,009 candidates of off-label drug uses and examined their potential by searching evidence from PubMed and FAERS. 
    more » « less
  2. Off-label drug use is quite common in clinical practice and inevitable to some extent. Such uses might deliver effective treatment and suggest clinical innovation sometimes, however, they have the unknown risk to cause serious outcomes due to lacking scientific support. As gaining information about off-label drug use could present a clue to the stakeholders such as healthcare professionals and medication manufacturers to further the investigation on drug efficacy and safety, it raises the need to develop a systematic way to detect off-label drug uses. Considering the increasing discussions in online health communities (OHCs) among the health consumers, we proposed to harness the large volume of timely information in OHCs to develop an automated method for detecting off-label drug uses from health consumer generated data. From the text corpus, we extracted medical entities (diseases, drugs, and adverse drug reactions) with lexicon-based approaches and measured their interactions with word embedding models, based on which, we constructed a heterogeneous healthcare network. We defined several meta-path-based indicators to describe the drug-disease associations in the heterogeneous network and used them as features to train a binary classifier built on Random Forest algorithm, to recognize the known drug-disease associations. The classification model obtained better results when incorporating word embedding features and achieved the best performance when using both association rule mining features and word embedding features, with F1-score reaching 0.939, based on which, we identified 2,125 possible off-label drug uses and checked their potential by searching evidence in PubMed and FAERS. 
    more » « less
  3. Abstract Motivation

    Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.


    Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.

    Availability and implementation

    The source code and data used in NeoDTI are available at:

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  4. Abstract

    Identifying new indications for drugs plays an essential role at many phases of drug research and development. Computational methods are regarded as an effective way to associate drugs with new indications. However, most of them complete their tasks by constructing a variety of heterogeneous networks without considering the biological knowledge of drugs and diseases, which are believed to be useful for improving the accuracy of drug repositioning. To this end, a novel heterogeneous information network (HIN) based model, namely HINGRL, is proposed to precisely identify new indications for drugs based on graph representation learning techniques. More specifically, HINGRL first constructs a HIN by integrating drug–disease, drug–protein and protein–disease biological networks with the biological knowledge of drugs and diseases. Then, different representation strategies are applied to learn the features of nodes in the HIN from the topological and biological perspectives. Finally, HINGRL adopts a Random Forest classifier to predict unknown drug–disease associations based on the integrated features of drugs and diseases obtained in the previous step. Experimental results demonstrate that HINGRL achieves the best performance on two real datasets when compared with state-of-the-art models. Besides, our case studies indicate that the simultaneous consideration of network topology and biological knowledge of drugs and diseases allows HINGRL to precisely predict drug–disease associations from a more comprehensive perspective. The promising performance of HINGRL also reveals that the utilization of rich heterogeneous information provides an alternative view for HINGRL to identify novel drug–disease associations especially for new diseases.

    more » « less
  5. Off-label drug use is an important healthcare topic as it is quite common and sometimes inevitable in medical practice. Though gaining information about off-label drug uses could benefit a lot of healthcare stakeholders such as patients, physicians, and pharmaceutical companies, there is no such data repository of such information available. There is a desire for a systematic approach to detect off-label drug uses. Other than using data sources such as EHR and clinical notes that are provided by healthcare providers, we exploited social media data especially online health community (OHC) data to detect the off-label drug uses, with consideration of the increasing social media users and the large volume of valuable and timely user-generated contents. We adopted tensor decomposition technique, CP decomposition in this work, to deal with the sparsity and missing data problem in social media data. On the basis of tensor decomposition results, we used two approaches to identify off-label drug use candidates: (1) one is via ranking the CP decomposition resulting components, (2) the other one is applying a heterogeneous network mining method, proposed in our previous work [9], on the reconstructed dataset by CP decomposition. The first approach identified a number of significant off-label use candidates, for which we were able to conduct case studies and found medical explanations for 7 out of 12 identified off-label use candidates. The second approach achieved better performance than the previous method [9] by improving the F1-score by 3%. It demonstrated the effectiveness of performing tensor decomposition on social media data for detecting off-label drug use. 
    more » « less