Abstract Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed. 
                        more » 
                        « less   
                    
                            
                            LinRace: cell division history reconstruction of single cells using paired lineage barcode and gene expression data
                        
                    
    
            Abstract Lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes in single cells, which allows for inference of cell lineage and cell types at the whole organism level. While most state-of-the-art methods for lineage reconstruction utilize only the lineage barcode data, methods that incorporate gene expressions are emerging. Effectively incorporating the gene expression data requires a reasonable model of how gene expression data changes along generations of divisions. Here, we present LinRace (Lineage Reconstruction with asymmetric cell division model), which integrates lineage barcode and gene expression data using asymmetric cell division model and infers cell lineages and ancestral cell states using Neighbor-Joining and maximum-likelihood heuristics. On both simulated and real data, LinRace outputs more accurate cell division trees than existing methods. With inferred ancestral states, LinRace can also show how a progenitor cell generates a large population of cells with various functionalities. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2019771
- PAR ID:
- 10480130
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Summary The left–right (L–R) axis of most bilateral animals is established during gastrulation when a transient ciliated structure creates a directional flow of signaling molecules that establish asymmetric gene expression in the lateral plate mesoderm. However, in some animals, an earlier differential distribution of molecules and cell division patterns initiate or at least influence L–R patterning. Using single‐cell high‐resolution mass spectrometry, we previously reported a limited number of small molecule (metabolite) concentration differences between left and right dorsal‐animal blastomeres of the eight‐cellXenopusembryo. Herein, we examined whether altering the distribution of some of these molecules influenced early events in L–R patterning. Using lineage tracing, we found that injecting right‐enriched metabolites into the left cell caused its descendant cells to disperse in patterns that varied from those in control gastrulae; this did not occur when left‐enriched metabolites were injected into the right cell. At later stages, injecting left‐enriched metabolites into the right cell perturbed the expression of genes known to: (a) be required for the formation of the gastrocoel roof plate (foxj1); (b) lead to the asymmetric expression of Nodal (dand5/coco); or (c) result from asymmetricalnodalexpression (pitx2). Despite these perturbations in gene expression, we did not observe heterotaxy in heart or gut looping at tadpole stages. These studies indicate that altering metabolite distribution at cleavage stages at the concentrations tested in this study impacts the earliest steps of L–R gene expression that then can be compensated for during organogenesis.more » « less
- 
            Introduction: Many current healthcare challenges including cancer and infectious diseases are controlled by the evolutionary dynamics of heterogenous cell populations. In cancer, evidence shows that intratumoral heterogeneity is a contributor to chemoresistance and metastasis driven by rare mutations and epigenetic changes. High-diversity DNA barcode libraries stably integrated into cells have been used to track these populations over time. However, uncovering these lineage dynamics has been a primarily destructive process. Recently, our lab has developed a lineage-tracing platform, Control of Lineage by Barcode Enabled Recombinant Transcription (COLBERT), to precisely monitor heterogenous subpopulations within a tumor. Importantly, this platform also affords us the ability to isolate specific subpopulations through activation of a lineage specific gene expression circuit. Here we demonstrate successful isolation of subpopulations within MDA-MB-231 breast carcinoma cells treated with doxorubicin.more » « less
- 
            Abstract BackgroundSingle-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. ResultsWe aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. ConclusionWe show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.more » « less
- 
            Abstract BackgroundAnalyzing single-cell RNA sequencing (scRNAseq) data plays an important role in understanding the intrinsic and extrinsic cellular processes in biological and biomedical research. One significant effort in this area is the identification of cell types. With the availability of a huge amount of single cell sequencing data and discovering more and more cell types, classifying cells into known cell types has become a priority nowadays. Several methods have been introduced to classify cells utilizing gene expression data. However, incorporating biological gene interaction networks has been proved valuable in cell classification procedures. ResultsIn this study, we propose a multimodal end-to-end deep learning model, named sigGCN, for cell classification that combines a graph convolutional network (GCN) and a neural network to exploit gene interaction networks. We used standard classification metrics to evaluate the performance of the proposed method on the within-dataset classification and the cross-dataset classification. We compared the performance of the proposed method with those of the existing cell classification tools and traditional machine learning classification methods. ConclusionsResults indicate that the proposed method outperforms other commonly used methods in terms of classification accuracy and F1 scores. This study shows that the integration of prior knowledge about gene interactions with gene expressions using GCN methodologies can extract effective features improving the performance of cell classification.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
