skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 29 until 11:59 PM ET on Saturday, September 30 due to maintenance. We apologize for the inconvenience.

Title: Inference of trajectory presence by tree dimension and subset specificity by subtree cover
The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.  more » « less
Award ID(s):
Author(s) / Creator(s):
Iakoucheva, Lilia M.
Date Published:
Journal Name:
PLOS Computational Biology
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Cells progressing from an early state to a developed state give rise to lineages in cell differentiation. Knowledge of these lineages is central to developmental biology. Each biological lineage corresponds to a trajectory in a dynamical system. Emerging single-cell technologies such as single-cell RNA sequencing can capture molecular abundance in diverse cell types in a developing tissue. Many computational methods have been developed to infer trajectories from single-cell data. However, to our knowledge, none of the existing methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference.


    We introduce a method to identify the existence of a trajectory using three graph-based statistics. A permutation test is utilized to calculate the empirical distribution of the test statistic under the null hypothesis that a trajectory does not exist. Finally, ap-value is calculated to quantify the statistical significance for the presence of trajectory in the data.


    Our work contributes new statistics to assess the level of uncertainty in trajectory inference to increase the understanding of biological system dynamics.

    more » « less
  2. High-throughput microfluidics-based assays can potentially increase the speed and quality of yeast replicative lifespan measurements. One major challenge is to efficiently convert large volumes of time-lapse images into quantitative measurements of cellular lifespans. Here, we address this challenge by prototyping an algorithm that can track cellular division events through family trees of cells. We generated a null distribution using single cells inside microfluidic traps. Based on this null distribution, we prototyped a maximum likelihood algorithm for cell tracking between images at different time-points. We inferred cell family trees through a likelihood based trace-back method. The branching patterns of the cell family trees are then used to infer replicative lifespan of the yeast mother cells. The longest branch of a cell family tree represents the full trajectory of a yeast mother cell. The replicative lifespan of this mother cell can be counted as the number of bifurcating branches of this family tree. In addition, we prototyped a different approach based on summing cells area which improved the replicative lifespan estimation significantly. These generic methods have the potential to accelerate the efficiency and expand the range of quantitative measurement of yeast replicative aging experiments. 
    more » « less
  3. Kelso, Janet (Ed.)
    Abstract Motivation Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. Results Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find secondorder differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differential patterns constitute evidence for fundamentally rewired biological circuitry due to evolution, environment, or disease. Availability The generic Sharma-Song test is available from the R package ‘DiffXTables’ at Other code and data are described in Methods. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. Abstract

    Water availability influences all aspects of plant growth and development; however, most studies of plant responses to drought have focused on vegetative organs, notably roots and leaves. Far less is known about the molecular bases of drought acclimation responses in fruits, which are complex organs with distinct tissue types. To obtain a more comprehensive picture of the molecular mechanisms governing fruit development under drought, we profiled the transcriptomes of a spectrum of fruit tissues from tomato (Solanum lycopersicum), spanning early growth through ripening and collected from plants grown under varying intensities of water stress. In addition, we compared transcriptional changes in fruit with those in leaves to highlight different and conserved transcriptome signatures in vegetative and reproductive organs. We observed extensive and diverse genetic reprogramming in different fruit tissues and leaves, each associated with a unique response to drought acclimation. These included major transcriptional shifts in the placenta of growing fruit and in the seeds of ripe fruit related to cell growth and epigenetic regulation, respectively. Changes in metabolic and hormonal pathways, such as those related to starch, carotenoids, jasmonic acid, and ethylene metabolism, were associated with distinct fruit tissues and developmental stages. Gene coexpression network analysis provided further insights into the tissue-specific regulation of distinct responses to water stress. Our data highlight the spatiotemporal specificity of drought responses in tomato fruit and indicate known and unrevealed molecular regulatory mechanisms involved in drought acclimation, during both vegetative and reproductive stages of development.

    more » « less
  5. A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing problems like identity and uniformity. This significantly improves over the standard sampling model where polynomially many samples are necessary. Inspired by these results, we introduce a computational model based on conditional sampling to develop sublinear algorithms with exponentially faster runtimes compared to standard sublinear algorithms. We focus on geometric optimization problems over points in high dimensional Euclidean space. Access to these points is provided via a conditional sampling oracle that takes as input a succinct representation of a subset of the domain and outputs a uniformly random point in that subset. We study two well studied problems: k-means clustering and estimating the weight of the minimum spanning tree. In contrast to prior algorithms for the classic model, our algorithms have time, space and sample complexity that is polynomial in the dimension and polylogarithmic in the number of points. Finally, we comment on the applicability of the model and compare with existing ones like streaming, parallel and distributed computational models. 
    more » « less