skip to main content


Title: Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data
Abstract

Drug screening data from massive bulk gene expression databases can be analyzed to determine the optimal clinical application of cancer drugs. The growing amount of single-cell RNA sequencing (scRNA-seq) data also provides insights into improving therapeutic effectiveness by helping to study the heterogeneity of drug responses for cancer cell subpopulations. Developing computational approaches to predict and interpret cancer drug response in single-cell data collected from clinical samples can be very useful. We propose scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq. Another feature of scDEAL is the integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. We benchmark scDEAL on six scRNA-seq datasets and demonstrate its model interpretability via three case studies focusing on drug response label prediction, gene signature identification, and pseudotime analysis. We believe that scDEAL could help study cell reprogramming, drug selection, and repurposing for improving therapeutic efficacy.

 
more » « less
Award ID(s):
1945971
NSF-PAR ID:
10377774
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Pancreatic cancer is a complex disease with a desmoplastic stroma, extreme hypoxia, and inherent resistance to therapy. Understanding the signaling and adaptive response of such an aggressive cancer is key to making advances in therapeutic efficacy. Redox factor-1 (Ref-1), a redox signaling protein, regulates the conversion of several transcription factors (TFs), including HIF-1α, STAT3 and NFκB from an oxidized to reduced state leading to enhancement of their DNA binding. In our previously published work, knockdown of Ref-1 under normoxia resulted in altered gene expression patterns on pathways including EIF2, protein kinase A, and mTOR. In this study, single cell RNA sequencing (scRNA-seq) and proteomics were used to explore the effects of Ref-1 on metabolic pathways under hypoxia. Methods scRNA-seq comparing pancreatic cancer cells expressing less than 20% of the Ref-1 protein was analyzed using left truncated mixture Gaussian model and validated using proteomics and qRT-PCR. The identified Ref-1’s role in mitochondrial function was confirmed using mitochondrial function assays, qRT-PCR, western blotting and NADP assay. Further, the effect of Ref-1 redox function inhibition against pancreatic cancer metabolism was assayed using 3D co-culture in vitro and xenograft studies in vivo. Results Distinct transcriptional variation in central metabolism, cell cycle, apoptosis, immune response, and genes downstream of a series of signaling pathways and transcriptional regulatory factors were identified in Ref-1 knockdown vs Scrambled control from the scRNA-seq data. Mitochondrial DEG subsets downregulated with Ref-1 knockdown were significantly reduced following Ref-1 redox inhibition and more dramatically in combination with Devimistat in vitro. Mitochondrial function assays demonstrated that Ref-1 knockdown and Ref-1 redox signaling inhibition decreased utilization of TCA cycle substrates and slowed the growth of pancreatic cancer co-culture spheroids. In Ref-1 knockdown cells, a higher flux rate of NADP + consuming reactions was observed suggesting the less availability of NADP + and a higher level of oxidative stress in these cells. In vivo xenograft studies demonstrated that tumor reduction was potent with Ref-1 redox inhibitor similar to Devimistat. Conclusion Ref-1 redox signaling inhibition conclusively alters cancer cell metabolism by causing TCA cycle dysfunction while also reducing the pancreatic tumor growth in vitro as well as in vivo. 
    more » « less
  2. null (Ed.)
    Abstract Background Drug sensitivity prediction and drug responsive biomarker selection on high-throughput genomic data is a critical step in drug discovery. Many computational methods have been developed to serve this purpose including several deep neural network models. However, the modular relations among genomic features have been largely ignored in these methods. To overcome this limitation, the role of the gene co-expression network on drug sensitivity prediction is investigated in this study. Methods In this paper, we first introduce a network-based method to identify representative features for drug response prediction by using the gene co-expression network. Then, two graph-based neural network models are proposed and both models integrate gene network information directly into neural network for outcome prediction. Next, we present a large-scale comparative study among the proposed network-based methods, canonical prediction algorithms (i.e., Elastic Net, Random Forest, Partial Least Squares Regression, and Support Vector Regression), and deep neural network models for drug sensitivity prediction. All the source code and processed datasets in this study are available at https://github.com/compbiolabucf/drug-sensitivity-prediction . Results In the comparison of different feature selection methods and prediction methods on a non-small cell lung cancer (NSCLC) cell line RNA-seq gene expression dataset with 50 different drug treatments, we found that (1) the network-based feature selection method improves the prediction performance compared to Pearson correlation coefficients; (2) Random Forest outperforms all the other canonical prediction algorithms and deep neural network models; (3) the proposed graph-based neural network models show better prediction performance compared to deep neural network model; (4) the prediction performance is drug dependent and it may relate to the drug’s mechanism of action. Conclusions Network-based feature selection method and prediction models improve the performance of the drug response prediction. The relations between the genomic features are more robust and stable compared to the correlation between each individual genomic feature and the drug response in high dimension and low sample size genomic datasets. 
    more » « less
  3. Abstract Background

    Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions.

    Results

    We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses.

    Conclusion

    We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.

     
    more » « less
  4. Abstract

    To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell‐type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell‐type‐specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk‐profiled samples by using reference single‐cell (sc) RNA‐sequencing (RNA‐seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed‐effect models provide similar results as an elegant machine learning algorithm named cell‐type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well‐known and reliable procedure to reconstruct cell‐type abundances and cell‐type‐specific gene expression profiles. In real data analysis, the mixed‐effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA‐seq data, it is better to use mixed models with heterogeneous mean and variance–covariance. As a byproduct, the mixed models provide fractions of covariance between subject‐specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA‐seq data.

     
    more » « less
  5. Inferring gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data is an important computational question to find regulatory mechanisms involved in fundamental cellular processes. Although many computational methods have been designed to predict GRNs from scRNA-seq data, they usually have high false positive rates and none infer GRNs by directly using the paired datasets of case-versus-control experiments. Here we present a novel deep-learning-based method, named scTIGER, for GRN detection by using the co-differential relationships of gene expression profiles in paired scRNA-seq datasets. scTIGER employs cell-type-based pseudotiming, an attention-based convolutional neural network method and permutation-based significance testing for inferring GRNs among gene modules. As state-of-the-art applications, we first applied scTIGER to scRNA-seq datasets of prostate cancer cells, and successfully identified the dynamic regulatory networks of AR, ERG, PTEN and ATF3 for same-cell type between prostatic cancerous and normal conditions, and two-cell types within the prostatic cancerous environment. We then applied scTIGER to scRNA-seq data from neurons with and without fear memory and detected specific regulatory networks for BDNF, CREB1 and MAPK4. Additionally, scTIGER demonstrates robustness against high levels of dropout noise in scRNA-seq data.

     
    more » « less