Abstract Recently, lineage tracing technology using CRISPR/Cas9 genome editing has enabled simultaneous readouts of gene expressions and lineage barcodes, which allows for the reconstruction of the cell division tree and makes it possible to reconstruct ancestral cell types and trace the origin of each cell type. Meanwhile, trajectory inference methods are widely used to infer cell trajectories and pseudotime in a dynamic process using gene expression data of present-day cells. Here, we present TedSim (single-cell temporal dynamics simulator), which simulates the cell division events from the root cell to present-day cells, simultaneously generating two data modalities for each single cell: the lineage barcode and gene expression data. TedSim is a framework that connects the two problems: lineage tracing and trajectory inference. Using TedSim, we conducted analysis to show that (i) TedSim generates realistic gene expression and barcode data, as well as realistic relationships between these two data modalities; (ii) trajectory inference methods can recover the underlying cell state transition mechanism with balanced cell type compositions; and (iii) integrating gene expression and barcode data can provide more insights into the temporal dynamics in cell differentiation compared to using only one type of data, but better integration methods need to be developed.
more »
« less
Experimental guidance for discovering genetic networks through hypothesis reduction on time series
Large programs of dynamic gene expression, like cell cyles and circadian rhythms, are controlled by a relatively small “core” network of transcription factors and post-translational modifiers, working in concerted mutual regulation. Recent work suggests that system-independent, quantitative features of the dynamics of gene expression can be used to identify core regulators. We introduce an approach of iterative network hypothesis reduction from time-series data in which increasingly complex features of the dynamic expression of individual, pairs, and entire collections of genes are used to infer functional network models that can produce the observed transcriptional program. The culmination of our work is a computational pipeline, I terative N etwork H ypoth e sis Re ductio n from T emporal Dynamics (Inherent dynamics pipeline), that provides a priority listing of targets for genetic perturbation to experimentally infer network structure. We demonstrate the capability of this integrated computational pipeline on synthetic and yeast cell-cycle data.
more »
« less
- Award ID(s):
- 1839299
- PAR ID:
- 10378433
- Editor(s):
- Csikász-Nagy, Attila
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 10
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1010145
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Background Cell and circadian cycles control a large fraction of cell and organismal physiology by regulating large periodic transcriptional programs that encompass anywhere from 15 to 80% of the genome despite performing distinct functions. In each case, these large periodic transcriptional programs are controlled by gene regulatory networks (GRNs), and it has been shown through genetics and chromosome mapping approaches in model systems that at the core of these GRNs are small sets of genes that drive the transcript dynamics of the GRNs. However, it is unlikely that we have identified all of these core genes, even in model organisms. Moreover, large periodic transcriptional programs controlling a variety of processes certainly exist in important non-model organisms where genetic approaches to identifying networks are expensive, time-consuming, or intractable. Ideally, the core network components could be identified using data-driven approaches on the transcriptome dynamics data already available. Results This study shows that a unified set of quantified dynamic features of high-throughput time series gene expression data are more prominent in the core transcriptional regulators of cell and circadian cycles than in their outputs, in multiple organism, even in the presence of external periodic stimuli. Additionally, we observe that the power to discriminate between core and non-core genes is largely insensitive to the particular choice of quantification of these features. Conclusions There are practical applications of the approach presented in this study for network inference, since the result is a ranking of genes that is enriched for core regulatory elements driving a periodic phenotype. In this way, the method provides a prioritization of follow-up genetic experiments. Furthermore, these findings reveal something unexpected—that there are shared dynamic features of the transcript abundance of core components of unrelated GRNs that control disparate periodic phenotypes.more » « less
-
Mathelier, Anthony (Ed.)Abstract Motivation Methods to model dynamic changes in gene expression at a genome-wide level are not currently sufficient for large (temporally rich or single-cell) datasets. Variational autoencoders offer means to characterize large datasets and have been used effectively to characterize features of single-cell datasets. Here, we extend these methods for use with gene expression time series data. Results We present RVAgene: a recurrent variational autoencoder to model gene expression dynamics. RVAgene learns to accurately and efficiently reconstruct temporal gene profiles. It also learns a low dimensional representation of the data via a recurrent encoder network that can be used for biological feature discovery, and from which we can generate new gene expression data by sampling the latent space. We test RVAgene on simulated and real biological datasets, including embryonic stem cell differentiation and kidney injury response dynamics. In all cases, RVAgene accurately reconstructed complex gene expression temporal profiles. Via cross validation, we show that a low-error latent space representation can be learnt using only a fraction of the data. Through clustering and gene ontology term enrichment analysis on the latent space, we demonstrate the potential of RVAgene for unsupervised discovery. In particular, RVAgene identifies new programs of shared gene regulation of Lox family genes in response to kidney injury. Availability and implementation All datasets analyzed in this manuscript are publicly available and have been published previously. RVAgene is available in Python, at GitHub: https://github.com/maclean-lab/RVAgene; Zenodo archive: http://doi.org/10.5281/zenodo.4271097. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
Abstract Single-cell technologies can measure the expression of thousands of molecular features in individual cells undergoing dynamic biological processes. While examining cells along a computationally-ordered pseudotime trajectory can reveal how changes in gene or protein expression impact cell fate, identifying such dynamic features is challenging due to the inherent noise in single-cell data. Here, we present DELVE, an unsupervised feature selection method for identifying a representative subset of molecular features which robustly recapitulate cellular trajectories. In contrast to previous work, DELVE uses a bottom-up approach to mitigate the effects of confounding sources of variation, and instead models cell states from dynamic gene or protein modules based on core regulatory complexes. Using simulations, single-cell RNA sequencing, and iterative immunofluorescence imaging data in the context of cell cycle and cellular differentiation, we demonstrate how DELVE selects features that better define cell-types and cell-type transitions. DELVE is available as an open-source python package:https://github.com/jranek/delve.more » « less
-
Gene expression is inherently dynamic, due to complex regulation and stochastic biochemical events. However, the effects of these dynamics on cell phenotypes can be difficult to determine. Researchers have historically been limited to passive observations of natural dynamics, which can preclude studies of elusive and noisy cellular events where large amounts of data are required to reveal statistically significant effects. Here, using recent advances in the fields of machine learning and control theory, we train a deep neural network to accurately predict the response of an optogenetic system inEscherichia colicells. We then use the network in a deep model predictive control framework to impose arbitrary and cell-specific gene expression dynamics on thousands of single cells in real time, applying the framework to generate complex time-varying patterns. We also showcase the framework’s ability to link expression patterns to dynamic functional outcomes by controlling expression of thetetAantibiotic resistance gene. This study highlights how deep learning-enabled feedback control can be used to tailor distributions of gene expression dynamics with high accuracy and throughput without expert knowledge of the biological system.more » « less
An official website of the United States government

