Off-label drug use refers to using marketed drugs for indications that are not listed in their FDA labeling information. Such uses are very common and sometimes inevitable in clinical practice. To some extent, off-label drug uses provide a pathway for clinical innovation, however, they could cause serious adverse effects due to lacking scientific research and tests. Since identifying the off-label uses can provide a clue to the stakeholders including healthcare providers, patients, and medication manufacturers to further the investigation on drug efficacy and safety, it raises the demand for a systematic way to detect off-label uses. Given data contributed by health consumers in online health communities (OHCs), we developed an automated approach to detect off-label drug uses based on heterogeneous network mining. We constructed a heterogeneous healthcare network with medical entities (e.g. disease, drug, adverse drug reaction) mined from the text corpus, which involved 50 diseases, 1,297 drugs, and 185 ADRs, and determined 13 meta paths between the drugs and diseases. We developed three metrics to represent the meta-path-based topological features. With the network features, we trained the binary classifiers built on Random Forest algorithm to recognize the known drug-disease associations. The best classification model that used lift to measure path weights obtained F1-score of 0.87, based on which, we identified 1,009 candidates of off-label drug uses and examined their potential by searching evidence from PubMed and FAERS.
more »
« less
Determining Associations with Word Embedding in Heterogeneous Network for Detecting Off-label Drug Uses
Off-label drug use is quite common in clinical practice and inevitable to some extent. Such uses might deliver effective treatment and suggest clinical innovation sometimes, however, they have the unknown risk to cause serious outcomes due to lacking scientific support. As gaining information about off-label drug use could present a clue to the stakeholders such as healthcare professionals and medication manufacturers to further the investigation on drug efficacy and safety, it raises the need to develop a systematic way to detect off-label drug uses. Considering the increasing discussions in online health communities (OHCs) among the health consumers, we proposed to harness the large volume of timely information in OHCs to develop an automated method for detecting off-label drug uses from health consumer generated data. From the text corpus, we extracted medical entities (diseases, drugs, and adverse drug reactions) with lexicon-based approaches and measured their interactions with word embedding models, based on which, we constructed a heterogeneous healthcare network. We defined several meta-path-based indicators to describe the drug-disease associations in the heterogeneous network and used them as features to train a binary classifier built on Random Forest algorithm, to recognize the known drug-disease associations. The classification model obtained better results when incorporating word embedding features and achieved the best performance when using both association rule mining features and word embedding features, with F1-score reaching 0.939, based on which, we identified 2,125 possible off-label drug uses and checked their potential by searching evidence in PubMed and FAERS.
more »
« less
- Award ID(s):
- 1650531
- PAR ID:
- 10048041
- Date Published:
- Journal Name:
- IEEE International Conference on Healthcare Informatics 2017
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Off-label drug use is an important healthcare topic as it is quite common and sometimes inevitable in medical practice. Though gaining information about off-label drug uses could benefit a lot of healthcare stakeholders such as patients, physicians, and pharmaceutical companies, there is no such data repository of such information available. There is a desire for a systematic approach to detect off-label drug uses. Other than using data sources such as EHR and clinical notes that are provided by healthcare providers, we exploited social media data especially online health community (OHC) data to detect the off-label drug uses, with consideration of the increasing social media users and the large volume of valuable and timely user-generated contents. We adopted tensor decomposition technique, CP decomposition in this work, to deal with the sparsity and missing data problem in social media data. On the basis of tensor decomposition results, we used two approaches to identify off-label drug use candidates: (1) one is via ranking the CP decomposition resulting components, (2) the other one is applying a heterogeneous network mining method, proposed in our previous work [9], on the reconstructed dataset by CP decomposition. The first approach identified a number of significant off-label use candidates, for which we were able to conduct case studies and found medical explanations for 7 out of 12 identified off-label use candidates. The second approach achieved better performance than the previous method [9] by improving the F1-score by 3%. It demonstrated the effectiveness of performing tensor decomposition on social media data for detecting off-label drug use.more » « less
-
Abstract Motivation Graph embedding learning that aims to automatically learn low-dimensional node representations, has drawn increasing attention in recent years. To date, most recent graph embedding methods are evaluated on social and information networks and are not comprehensively studied on biomedical networks under systematic experiments and analyses. On the other hand, for a variety of biomedical network analysis tasks, traditional techniques such as matrix factorization (which can be seen as a type of graph embedding methods) have shown promising results, and hence there is a need to systematically evaluate the more recent graph embedding methods (e.g. random walk-based and neural network-based) in terms of their usability and potential to further the state-of-the-art. Results We select 11 representative graph embedding methods and conduct a systematic comparison on 3 important biomedical link prediction tasks: drug-disease association (DDA) prediction, drug–drug interaction (DDI) prediction, protein–protein interaction (PPI) prediction; and 2 node classification tasks: medical term semantic type classification, protein function prediction. Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis. Compared with three state-of-the-art methods for DDAs, DDIs and protein function predictions, the recent graph embedding methods achieve competitive performance without using any biological features and the learned embeddings can be treated as complementary representations for the biological features. By summarizing the experimental results, we provide general guidelines for properly selecting graph embedding methods and setting their hyper-parameters for different biomedical tasks. Availability and implementation As part of our contributions in the paper, we develop an easy-to-use Python package with detailed instructions, BioNEV, available at: https://github.com/xiangyue9607/BioNEV, including all source code and datasets, to facilitate studying various graph embedding methods on biomedical tasks. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
Opioid addiction constitutes a significant contemporary health crisis that is multifarious in its complexity. Modeling the epidemiology of any addiction is challenging in its own right. For opioid addiction, the challenge is exacerbated due to the difficulties in collecting real-time data and the circumscribed nature of information opioid users may disclose owing to stigma associated with prescription misuse. Given this context, identifying the progression of individuals through the stages of (opioid) addiction is one of the more acute problems in epidemiological modeling whose solution is crucial for designing specific interventions at both personal and population levels. We describe a computational approach for determining and characterizing addiction stages of opioid users from their social media posts. The proposed approach combines recurrent neural network learning with information-theoretic analysis of word-associations and context-based word embedding to determine addiction stage-specific language usage. Users who have a high likelihood for relapsing back to drug-use are identified and characterized using propensity score matching and logistic regression. Experimental evaluations indicate that the proposed approach can distinguish between various addiction stages and identify users prone to relapse with high accuracy as evidenced by F1 scores of 0.88 and 0.79 respectivelymore » « less
-
Abstract Although combination antiretroviral therapy (ART) with three or more drugs is highly effective in suppressing viral load for people with HIV (human immunodeficiency virus), many ART agents may exacerbate mental health‐related adverse effects including depression. Therefore, understanding the effects of combination ART on mental health can help clinicians personalize medicine with less adverse effects to avoid undesirable health outcomes. The emergence of electronic health records offers researchers' unprecedented access to HIV data including individuals' mental health records, drug prescriptions, and clinical information over time. However, modeling such data is challenging due to high dimensionality of the drug combination space, the individual heterogeneity, and sparseness of the observed drug combinations. To address these challenges, we develop a Bayesian nonparametric approach to learn drug combination effect on mental health in people with HIV adjusting for sociodemographic, behavioral, and clinical factors. The proposed method is built upon the subset‐tree kernel that represents drug combinations in a way that synthesizes known regimen structure into a single mathematical representation. It also utilizes a distance‐dependent Chinese restaurant process to cluster heterogeneous populations while considering individuals' treatment histories. We evaluate the proposed approach through simulation studies, and apply the method to a dataset from the Women's Interagency HIV Study, showing the clinical utility of our model in guiding clinicians to prescribe informed and effective personalized treatment based on individuals' treatment histories and clinical characteristics.more » « less
An official website of the United States government

