skip to main content


Title: Predicting serious rare adverse reactions of novel chemicals
Abstract Motivation

Adverse drug reactions (ADRs) are one of the main causes of death and a major financial burden on the world’s economy. Due to the limitations of the animal model, computational prediction of serious and rare ADRs is invaluable. However, current state-of-the-art computational methods do not yield significantly better predictions of rare ADRs than random guessing.

Results

We present a novel method, based on the theory of ‘compressed sensing’ (CS), which can accurately predict serious side-effects of candidate and market drugs. Not only is our method able to infer new chemical-ADR associations using existing noisy, biased and incomplete databases, but our data also demonstrate that the accuracy of CS in predicting a serious ADR for a candidate drug increases with increasing knowledge of other ADRs associated with the drug. In practice, this means that as the candidate drug moves up the different stages of clinical trials, the prediction accuracy of our method will increase accordingly.

Availability and implementation

The program is available at https://github.com/poleksic/side-effects.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10393452
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
34
Issue:
16
ISSN:
1367-4803
Page Range / eLocation ID:
p. 2835-2842
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Computational methods for compound–protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound–protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound–protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models.

    Results

    To overcome the aforementioned challenges of structure naivety and labeled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pre-trained under various self-supervised learning strategies, by leveraging massive amount of unlabeled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins.

    Availability and implementation

    Data and source codes are available at https://github.com/Shen-Lab/CPAC.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    The use of drug combinations, termed polypharmacy, is common to treat patients with complex diseases or co-existing conditions. However, a major consequence of polypharmacy is a much higher risk of adverse side effects for the patient. Polypharmacy side effects emerge because of drug–drug interactions, in which activity of one drug may change, favorably or unfavorably, if taken with another drug. The knowledge of drug interactions is often limited because these complex relationships are rare, and are usually not observed in relatively small clinical testing. Discovering polypharmacy side effects thus remains an important challenge with significant implications for patient mortality and morbidity.

    Results

    Here, we present Decagon, an approach for modeling polypharmacy side effects. The approach constructs a multimodal graph of protein–protein interactions, drug–protein target interactions and the polypharmacy side effects, which are represented as drug–drug interactions, where each side effect is an edge of a different type. Decagon is developed specifically to handle such multimodal graphs with a large number of edge types. Our approach develops a new graph convolutional neural network for multirelational link prediction in multimodal networks. Unlike approaches limited to predicting simple drug–drug interaction values, Decagon can predict the exact side effect, if any, through which a given drug combination manifests clinically. Decagon accurately predicts polypharmacy side effects, outperforming baselines by up to 69%. We find that it automatically learns representations of side effects indicative of co-occurrence of polypharmacy in patients. Furthermore, Decagon models particularly well polypharmacy side effects that have a strong molecular basis, while on predominantly non-molecular side effects, it achieves good performance because of effective sharing of model parameters across edge types. Decagon opens up opportunities to use large pharmacogenomic and patient population data to flag and prioritize polypharmacy side effects for follow-up analysis via formal pharmacological studies.

    Availability and implementation

    Source code and preprocessed datasets are at: http://snap.stanford.edu/decagon.

     
    more » « less
  3. Abstract Motivation

    Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design.

    Results

    We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases.

    Availability and implementation

    The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract Motivation

    Accurate prediction and interpretation of ligand bioactivities are essential for virtual screening and drug discovery. Unfortunately, many important drug targets lack experimental data about the ligand bioactivities; this is particularly true for G protein-coupled receptors (GPCRs), which account for the targets of about a third of drugs currently on the market. Computational approaches with the potential of precise assessment of ligand bioactivities and determination of key substructural features which determine ligand bioactivities are needed to address this issue.

    Results

    A new method, SED, was proposed to predict ligand bioactivities and to recognize key substructures associated with GPCRs through the coupling of screening for Lasso of long extended-connectivity fingerprints (ECFPs) with deep neural network training. The SED pipeline contains three successive steps: (i) representation of long ECFPs for ligand molecules, (ii) feature selection by screening for Lasso of ECFPs and (iii) bioactivity prediction through a deep neural network regression model. The method was examined on a set of 16 representative GPCRs that cover most subfamilies of human GPCRs, where each has 300–5000 ligand associations. The results show that SED achieves excellent performance in modelling ligand bioactivities, especially for those in the GPCR datasets without sufficient ligand associations, where SED improved the baseline predictors by 12% in correlation coefficient (r2) and 19% in root mean square error. Detail data analyses suggest that the major advantage of SED lies on its ability to detect substructures from long ECFPs which significantly improves the predictive performance.

    Availability and implementation

    The source code and datasets of SED are freely available at https://zhanglab.ccmb.med.umich.edu/SED/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Motivation

    Finding driver genes that are responsible for the aberrant proliferation rate of cancer cells is informative for both cancer research and the development of targeted drugs. The established experimental and computational methods are labor-intensive. To make algorithms feasible in real clinical settings, methods that can predict driver genes using less experimental data are urgently needed.

    Results

    We designed an effective feature selection method and used Support Vector Machines (SVM) to predict the essentiality of the potential driver genes in cancer cell lines with only 10 genes as features. The accuracy of our predictions was the highest in the Broad-DREAM Gene Essentiality Prediction Challenge. We also found a set of genes whose essentiality could be predicted much more accurately than others, which we called Accurately Predicted (AP) genes. Our method can serve as a new way of assessing the essentiality of genes in cancer cells.

    Availability and implementation

    The raw data that support the findings of this study are available at Synapse. https://www.synapse.org/#! Synapse: syn2384331/wiki/62825. Source code is available at GitHub. https://github.com/GuanLab/DREAM-Gene-Essentiality-Challenge.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less