skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Joint trajectory inference for single-cell genomics using deep learning with a mixture prior
Trajectory inference methods are essential for analyzing the developmental paths of cells in single-cell sequencing datasets. It provides insights into cellular differentiation, transitions, and lineage hierarchies, helping unravel the dynamic processes underlying development and disease progression. However, many existing tools lack a coherent statistical model and reliable uncertainty quantification, limiting their utility and robustness. In this paper, we introduce VITAE (Variational Inference for Trajectory by AutoEncoder), a statistical approach that integrates a latent hierarchical mixture model with variational autoencoders to infer trajectories. The statistical hierarchical model enhances the interpretability of our framework, while the posterior approximations generated by our variational autoencoder ensure computational efficiency and provide uncertainty quantification of cell projections along trajectories. Specifically, VITAE enables simultaneous trajectory inference and data integration, improving the accuracy of learning a joint trajectory structure in the presence of biological and technical heterogeneity across datasets. We show that VITAE outperforms other state-of-the-art trajectory inference methods on both real and synthetic data under various trajectory topologies. Furthermore, we apply VITAE to jointly analyze three distinct single-cell RNA sequencing datasets of the mouse neocortex, unveiling comprehensive developmental lineages of projection neurons. VITAE effectively reduces batch effects within and across datasets and uncovers finer structures that might be overlooked in individual datasets. Additionally, we showcase VITAE’s efficacy in integrative analyses of multiomic datasets with continuous cell population structures.  more » « less
Award ID(s):
2113646 2238656
PAR ID:
10553794
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
37
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Cells progressing from an early state to a developed state give rise to lineages in cell differentiation. Knowledge of these lineages is central to developmental biology. Each biological lineage corresponds to a trajectory in a dynamical system. Emerging single-cell technologies such as single-cell RNA sequencing can capture molecular abundance in diverse cell types in a developing tissue. Many computational methods have been developed to infer trajectories from single-cell data. However, to our knowledge, none of the existing methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference. Results We introduce a method to identify the existence of a trajectory using three graph-based statistics. A permutation test is utilized to calculate the empirical distribution of the test statistic under the null hypothesis that a trajectory does not exist. Finally, a p -value is calculated to quantify the statistical significance for the presence of trajectory in the data. Conclusions Our work contributes new statistics to assess the level of uncertainty in trajectory inference to increase the understanding of biological system dynamics. 
    more » « less
  2. apid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over SOTA in single-cell population-level trajectory inference. 
    more » « less
  3. Learning explicit and implicit patterns in human trajectories plays an important role in many Location-Based Social Networks (LBSNs) applications, such as trajectory classification (e.g., walking, driving, etc.), trajectory-user linking, friend recommendation, etc. A particular problem that has attracted much attention recently – and is the focus of our work – is the Trajectory-based Social Circle Inference (TSCI), aiming at inferring user social circles (mainly social friendship) based on motion trajectories and without any explicit social networked information. Existing approaches addressing TSCI lack satisfactory results due to the challenges related to data sparsity, accessibility and model efficiency. Motivated by the recent success of machine learning in trajectory mining, in this paper we formulate TSCI as a novel multi-label classification problem and develop a Recurrent Neural Network (RNN)-based framework called DeepTSCI to use human mobility patterns for inferring corresponding social circles. We propose three methods to learn the latent representations of trajectories, based on: (1) bidirectional Long Short-Term Memory (LSTM); (2) Autoencoder; and (3) Variational autoencoder. Experiments conducted on real-world datasets demonstrate that our proposed methods perform well and achieve significant improvement in terms of macro-R, macro-F1 and accuracy when compared to baselines. 
    more » « less
  4. Abstract Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates. 
    more » « less
  5. Abstract SummaryCellular processes like development, differentiation, and disease progression are highly complex and dynamic (e.g. gene expression). These processes often undergo cell population changes driven by cell birth, proliferation, and death. Single-cell sequencing enables gene expression measurement at the cellular resolution, allowing us to decipher cellular and molecular dynamics underlying these processes. However, the high costs and destructive nature of sequencing restrict observations to snapshots of unaligned cells at discrete timepoints, limiting our understanding of these processes and complicating the reconstruction of cellular trajectories. To address this challenge, we propose ARTEMIS, a generative model integrating a variational autoencoder (VAE) with unbalanced Diffusion Schrödinger Bridge to model cellular processes by reconstructing cellular trajectories, reveal gene expression dynamics, and recover cell population changes. The VAE maps input time-series single-cell data to a continuous latent space, where trajectories are reconstructed by solving the Schrödinger bridge problem using forward-backward stochastic differential equations (SDEs). A drift function in the SDEs captures deterministic gene expression trends. An additional neural network estimates time-varying kill rates for single cells along trajectories, enabling recovery of cell population changes. Using three scRNA-seq datasets—pancreatic β-cell differentiation, zebrafish embryogenesis, and epithelial-mesenchymal transition (EMT) in cancer cells—we demonstrate that ARTEMIS: (i) outperforms state-of-art methods to predict held-out timepoints, (ii) recovers relative cell population changes over time, and (iii) identifies “drift” genes driving deterministic expression trends in cell trajectories. Furthermore, in silico perturbations show that these genes influence processes like EMT. Availability and implementationThe code for ARTEMIS: https://github.com/daifengwanglab/ARTEMIS. 
    more » « less