Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development and disease progression. However, many existing tools lack a coherent statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model enhances the interpretability of our framework, while the posterior approximations generated by our variational autoencoder ensure computational efficiency and provide uncertainty quantification of cell projections along trajectories. Specifically, VITAE enables simultaneous trajectory inference and data integration, improving the accuracy of learning a joint trajectory structure in the presence of biological and technical heterogeneity across datasets. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively reduces batch effects within and across datasets and uncovers finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAE’s efficacy in integrative analyses of multiomic datasets with continuous cell population structures.
more »
« less
Analysis of Variability of Functionals of Recombinant Protein Production Trajectories Based on Limited Data
Making statistical inference on quantities defining various characteristics of a temporally measured biochemical process and analyzing its variability across different experimental conditions is a core challenge in various branches of science. This problem is particularly difficult when the amount of data that can be collected is limited in terms of both the number of replicates and the number of time points per process trajectory. We propose a method for analyzing the variability of smooth functionals of the growth or production trajectories associated with such processes across different experimental conditions. Our modeling approach is based on a spline representation of the mean trajectories. We also develop a bootstrap-based inference procedure for the parameters while accounting for possible multiple comparisons. This methodology is applied to study two types of quantities—the “time to harvest” and “maximal productivity”—in the context of an experiment on the production of recombinant proteins. We complement the findings with extensive numerical experiments comparing the effectiveness of different types of bootstrap procedures for various tests of hypotheses. These numerical experiments convincingly demonstrate that the proposed method yields reliable inference on complex characteristics of the processes even in a data-limited environment where more traditional methods for statistical inference are typically not reliable.
more »
« less
- PAR ID:
- 10349722
- Date Published:
- Journal Name:
- International Journal of Molecular Sciences
- Volume:
- 23
- Issue:
- 14
- ISSN:
- 1422-0067
- Page Range / eLocation ID:
- 7628
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Linear mixed models are widely used for analyzing longitudinal datasets, and the inference for variance component parameters relies on the bootstrap method. However, health systems and technology companies routinely generate massive longitudinal datasets that make the traditional bootstrap method infeasible. To solve this problem, we extend the highly scalable bag of little bootstraps method for independent data to longitudinal data and develop a highly efficient Julia packageMixedModelsBLB.jl.Simulation experiments and real data analysis demonstrate the favorable statistical performance and computational advantages of our method compared to the traditional bootstrap method. For the statistical inference of variance components, it achieves 200 times speedup on the scale of 1 million subjects (20 million total observations), and is the only currently available tool that can handle more than 10 million subjects (200 million total observations) using desktop computers.more » « less
-
Brownian motion in one or more dimensions is extensively used as a stochastic process to model natural and engineering signals, as well as financial data. Most works dealing with multidimensional Brownian motion consider the different dimensions as independent components. In this article, we investigate a model of correlated Brownian motion in R2, where the individual components are not necessarily independent. We explore various statistical properties of the process under consideration, going beyond the conventional analysis of the second moment. Our particular focus lies on investigating the distribution of turning angles. This distribution reveals particularly interesting characteristics for processes with dependent components that are relevant to applications in diverse physical systems. Theoretical considerations are supported by numerical simulations and analysis of two real-world datasets: the financial data of the Dow Jones Industrial Average and the Standard and Poor’s 500, and trajectories of polystyrene beads in water. Finally, we show that the model can be readily extended to trajectories with correlations that change over time.more » « less
-
Variability in gene expression causes genetically identical cells to exhibit different phenotypes. One probable cause of this variability is transcriptional bursting, where the synthesis of RNA molecules randomly alternates with periods of silence in the transfer of genetic information. Yet, the molecular mechanisms behind this variability remain unclear. Experiments indicate that multiple biochemical states might be involved in the production of RNA molecules. Stimulated by these observations, we developed a theoretical framework to investigate the mechanisms of transcriptional bursting. It is based on a multi-state stochastic approach that provides a full quantitative description of the dynamic properties in the system. We found that the degree of stochastic fluctuations during transcription directly correlates with the number of biochemical states. This explains experimentally observed variability and fluctuations in the quantities of the produced RNA molecules. The procedure to estimate the number of relevant biochemical states participating in the transcription is outlined and applied for analysis of experimental results. We also developed a general dynamic phase diagram for the transcription process. The presented theoretical method clarifies physical−chemical aspects of the transcriptional bursting and presents a minimal chemical-kinetic description of the process.more » « less
-
The analysis of live-cell single-molecule imaging experiments can reveal valuable information about the heterogeneity of transport processes and interactions between cell components. These characteristics are seen as motion changes in the particle trajectories. Despite the existence of multiple approaches to carry out this type of analysis, no objective assessment of these methods has been performed so far. Here, we report the results of a competition to characterize and rank the performance of these methods when analyzing the dynamic behavior of single molecules. To run this competition, we implemented a software library that simulates realistic data corresponding to widespread diffusion and interaction models, both in the form of trajectories and videos obtained in typical experimental conditions. The competition constitutes the first assessment of these methods, providing insights into the current limitations of the field, fostering the development of new approaches, and guiding researchers to identify optimal tools for analyzing their experiments.more » « less
An official website of the United States government

