Drug resistance poses a crucial challenge in healthcare, with response rates to chemotherapy and targeted therapy remaining low. Individual patient's resistance is exacerbated by the intricate heterogeneity of tumor cells, presenting significant obstacles to effective treatment. To address this challenge, DrugFormer, a novel graph‐augmented large language model designed to predict drug resistance at single‐cell level is proposed. DrugFormer integrates both serialized gene tokens and gene‐based knowledge graphs for the accurate predictions of drug response. After training on comprehensive single‐cell data with drug response information, DrugFormer model presents outperformance, with higher F1, precision, and recall in predicting drug response. Based on the scRNA‐seq data from refractory multiple myeloma (MM) and acute myeloid leukemia (AML) patients, DrugFormer demonstrates high efficacy in identifying resistant cells and uncovering underlying molecular mechanisms. Through pseudotime trajectory analysisunique drug‐resistant cellular states associated with poor patient outcomes are revealed. Furthermore, DrugFormer identifies potential therapeutic targets, such as COX8A, for overcoming drug resistance across different cancer types. In conclusion, DrugFormer represents a significant advancement in the field of drug resistance prediction, offering a powerful tool for unraveling the heterogeneity of cellular response to drugs and guiding personalized treatment strategies.
Drug screening data from massive bulk gene expression databases can be analyzed to determine the optimal clinical application of cancer drugs. The growing amount of single-cell RNA sequencing (scRNA-seq) data also provides insights into improving therapeutic effectiveness by helping to study the heterogeneity of drug responses for cancer cell subpopulations. Developing computational approaches to predict and interpret cancer drug response in single-cell data collected from clinical samples can be very useful. We propose scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq. Another feature of scDEAL is the integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. We benchmark scDEAL on six scRNA-seq datasets and demonstrate its model interpretability via three case studies focusing on drug response label prediction, gene signature identification, and pseudotime analysis. We believe that scDEAL could help study cell reprogramming, drug selection, and repurposing for improving therapeutic efficacy.
more » « less- Award ID(s):
- 1945971
- NSF-PAR ID:
- 10377774
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 13
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350 bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq.more » « less
-
Abstract The transcriptional plasticity of cancer cells promotes intercellular heterogeneity in response to anticancer drugs and facilitates the generation of subpopulation surviving cells. Characterizing single-cell transcriptional heterogeneity after drug treatments can provide mechanistic insights into drug efficacy. Here, we used single-cell RNA-seq to examine transcriptomic profiles of cancer cells treated with paclitaxel, celecoxib and the combination of the two drugs. By normalizing the expression of endogenous genes to spike-in molecules, we found that cellular mRNA abundance shows dynamic regulation after drug treatment. Using a random forest model, we identified gene signatures classifying single cells into three states: transcriptional repression, amplification and control-like. Treatment with paclitaxel or celecoxib alone generally repressed gene transcription across single cells. Interestingly, the drug combination resulted in transcriptional amplification and hyperactivation of mitochondrial oxidative phosphorylation pathway linking to enhanced cell killing efficiency. Finally, we identified a regulatory module enriched with metabolism and inflammation-related genes activated in a subpopulation of paclitaxel-treated cells, the expression of which predicted paclitaxel efficacy across cancer cell lines and in vivo patient samples. Our study highlights the dynamic global transcriptional activity driving single-cell heterogeneity during drug response and emphasizes the importance of adding spike-in molecules to study gene expression regulation using single-cell RNA-seq.more » « less
-
Abstract While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.
-
Inferring gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data is an important computational question to find regulatory mechanisms involved in fundamental cellular processes. Although many computational methods have been designed to predict GRNs from scRNA-seq data, they usually have high false positive rates and none infer GRNs by directly using the paired datasets of case-versus-control experiments. Here we present a novel deep-learning-based method, named scTIGER, for GRN detection by using the co-differential relationships of gene expression profiles in paired scRNA-seq datasets. scTIGER employs cell-type-based pseudotiming, an attention-based convolutional neural network method and permutation-based significance testing for inferring GRNs among gene modules. As state-of-the-art applications, we first applied scTIGER to scRNA-seq datasets of prostate cancer cells, and successfully identified the dynamic regulatory networks of AR, ERG, PTEN and ATF3 for same-cell type between prostatic cancerous and normal conditions, and two-cell types within the prostatic cancerous environment. We then applied scTIGER to scRNA-seq data from neurons with and without fear memory and detected specific regulatory networks for BDNF, CREB1 and MAPK4. Additionally, scTIGER demonstrates robustness against high levels of dropout noise in scRNA-seq data.