Abstract BackgroundCrohn’s disease is a lifelong disease characterized by chronic inflammation of the gastrointestinal tract. Defining the cellular and transcriptional composition of the mucosa at different stages of disease progression is needed for personalized therapy in Crohn’s. MethodsIleal biopsies were obtained from (1) control subjects (n = 6), (2) treatment-naïve patients (n = 7), and (3) established (n = 14) Crohn’s patients along with remission (n = 3) and refractory (n = 11) treatment groups. The biopsies processed using 10x Genomics single cell 5' yielded 139 906 cells. Gene expression count matrices of all samples were analyzed by reciprocal principal component integration, followed by clustering analysis. Manual annotations of the clusters were performed using canonical gene markers. Cell type proportions, differential expression analysis, and gene ontology enrichment were carried out for each cell type. ResultsWe identified 3 cellular compartments with 9 epithelial, 1 stromal, and 5 immune cell subtypes. We observed differences in the cellular composition between control, treatment-naïve, and established groups, with the significant changes in the epithelial subtypes of the treatment-naïve patients, including microfold, tuft, goblet, enterocyte,s and BEST4+ cells. Surprisingly, fewer changes in the composition of the immune compartment were observed; however, gene expression in the epithelial and immune compartment was different between Crohn’s phenotypes, indicating changes in cellular activity. ConclusionsOur study identified cellular and transcriptional signatures associated with treatment-naïve Crohn’s disease that collectively point to dysfunction of the intestinal barrier with an increase in inflammatory cellular activity. Our analysis also highlights the heterogeneity among patients within the same disease phenotype, shining a new light on personalized treatment responses and strategies.
more »
« less
Integrating temporal single-cell gene expression modalities for trajectory inference and disease prediction
Abstract BackgroundCurrent methods for analyzing single-cell datasets have relied primarily on static gene expression measurements to characterize the molecular state of individual cells. However, capturing temporal changes in cell state is crucial for the interpretation of dynamic phenotypes such as the cell cycle, development, or disease progression. RNA velocity infers the direction and speed of transcriptional changes in individual cells, yet it is unclear how these temporal gene expression modalities may be leveraged for predictive modeling of cellular dynamics. ResultsHere, we present the first task-oriented benchmarking study that investigates integration of temporal sequencing modalities for dynamic cell state prediction. We benchmark ten integration approaches on ten datasets spanning different biological contexts, sequencing technologies, and species. We find that integrated data more accurately infers biological trajectories and achieves increased performance on classifying cells according to perturbation and disease states. Furthermore, we show that simple concatenation of spliced and unspliced molecules performs consistently well on classification tasks and can be used over more memory intensive and computationally expensive methods. ConclusionsThis work illustrates how integrated temporal gene expression modalities may be leveraged for predicting cellular trajectories and sample-associated perturbation and disease phenotypes. Additionally, this study provides users with practical recommendations for task-specific integration of single-cell gene expression modalities.
more »
« less
- Award ID(s):
- 1845796
- PAR ID:
- 10370978
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Genome Biology
- Volume:
- 23
- Issue:
- 1
- ISSN:
- 1474-760X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Current biotechnologies can simultaneously measure multiple high-dimensional modalities (e.g., RNA, DNA accessibility, and protein) from the same cells. A combination of different analytical tasks (e.g., multi-modal integration and cross-modal analysis) is required to comprehensively understand such data, inferring how gene regulation drives biological diversity and functions. However, current analytical methods are designed to perform a single task, only providing a partial picture of the multi-modal data. Here, we present UnitedNet, an explainable multi-task deep neural network capable of integrating different tasks to analyze single-cell multi-modality data. Applied to various multi-modality datasets (e.g., Patch-seq, multiome ATAC + gene expression, and spatial transcriptomics), UnitedNet demonstrates similar or better accuracy in multi-modal integration and cross-modal prediction compared with state-of-the-art methods. Moreover, by dissecting the trained UnitedNet with the explainable machine learning algorithm, we can directly quantify the relationship between gene expression and other modalities with cell-type specificity. UnitedNet is a comprehensive end-to-end framework that could be broadly applicable to single-cell multi-modality biology. This framework has the potential to facilitate the discovery of cell-type-specific regulation kinetics across transcriptomics and other modalities.more » « less
-
Abstract Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed.more » « less
-
Abstract MotivationIntegrative analysis of large-scale single-cell data collected from diverse cell populations promises an improved understanding of complex biological systems. While several algorithms have been developed for single-cell RNA-sequencing data integration, many lack the scalability to handle large numbers of datasets and/or millions of cells due to their memory and run time requirements. The few tools that can handle large data do so by reducing the computational burden through strategies such as subsampling of the data or selecting a reference dataset to improve computational efficiency and scalability. Such shortcuts, however, hamper the accuracy of downstream analyses, especially those requiring quantitative gene expression information. ResultsWe present SCEMENT, a SCalablE and Memory-Efficient iNTegration method, to overcome these limitations. Our new parallel algorithm builds upon and extends the linear regression model previously applied in ComBat to an unsupervised sparse matrix setting to enable accurate integration of diverse and large collections of single-cell RNA-sequencing data. Using tens to hundreds of real single-cell RNA-seq datasets, we show that SCEMENT outperforms ComBat as well as FastIntegration and Scanorama in runtime (upto 214× faster) and memory usage (upto 17.5× less). It not only performs batch correction and integration of millions of cells in under 25 min, but also facilitates the discovery of new rare cell types and more robust reconstruction of gene regulatory networks with full quantitative gene expression information. Availability and implementationSource code freely available for download at https://github.com/AluruLab/scement, implemented in C++ and supported on Linux.more » « less
-
Abstract BackgroundAnalyzing single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the identification of cell types. With the availability of a huge amount of single cell sequencing data and discovering more and more cell types, classifying cells into known cell types has become a priority nowadays. Several methods have been introduced to classify cells utilizing gene expression data. However, incorporating biological gene interaction networks has been proved valuable in cell classification procedures. ResultsIn this study, we propose a multimodal end-to-end deep learning model, named sigGCN, for cell classification that combines a graph convolutional network (GCN) and a neural network to exploit gene interaction networks. We used standard classification metrics to evaluate the performance of the proposed method on the within-dataset classification and the cross-dataset classification. We compared the performance of the proposed method with those of the existing cell classification tools and traditional machine learning classification methods. ConclusionsResults indicate that the proposed method outperforms other commonly used methods in terms of classification accuracy and F1 scores. This study shows that the integration of prior knowledge about gene interactions with gene expressions using GCN methodologies can extract effective features improving the performance of cell classification.more » « less
An official website of the United States government
