skip to main content


Title: Inference of trajectory presence by tree dimension and subset specificity by subtree cover
The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.  more » « less
Award ID(s):
1661331
NSF-PAR ID:
10334204
Author(s) / Creator(s):
;
Editor(s):
Iakoucheva, Lilia M.
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
18
Issue:
2
ISSN:
1553-7358
Page Range / eLocation ID:
e1009829
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Cells progressing from an early state to a developed state give rise to lineages in cell differentiation. Knowledge of these lineages is central to developmental biology. Each biological lineage corresponds to a trajectory in a dynamical system. Emerging single-cell technologies such as single-cell RNA sequencing can capture molecular abundance in diverse cell types in a developing tissue. Many computational methods have been developed to infer trajectories from single-cell data. However, to our knowledge, none of the existing methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference.

    Results

    We introduce a method to identify the existence of a trajectory using three graph-based statistics. A permutation test is utilized to calculate the empirical distribution of the test statistic under the null hypothesis that a trajectory does not exist. Finally, ap-value is calculated to quantify the statistical significance for the presence of trajectory in the data.

    Conclusions

    Our work contributes new statistics to assess the level of uncertainty in trajectory inference to increase the understanding of biological system dynamics.

     
    more » « less
  2. High-throughput microfluidics-based assays can potentially increase the speed and quality of yeast replicative lifespan measurements. One major challenge is to efficiently convert large volumes of time-lapse images into quantitative measurements of cellular lifespans. Here, we address this challenge by prototyping an algorithm that can track cellular division events through family trees of cells. We generated a null distribution using single cells inside microfluidic traps. Based on this null distribution, we prototyped a maximum likelihood algorithm for cell tracking between images at different time-points. We inferred cell family trees through a likelihood based trace-back method. The branching patterns of the cell family trees are then used to infer replicative lifespan of the yeast mother cells. The longest branch of a cell family tree represents the full trajectory of a yeast mother cell. The replicative lifespan of this mother cell can be counted as the number of bifurcating branches of this family tree. In addition, we prototyped a different approach based on summing cells area which improved the replicative lifespan estimation significantly. These generic methods have the potential to accelerate the efficiency and expand the range of quantitative measurement of yeast replicative aging experiments. 
    more » « less
  3. Summary

    We introduce an L2-type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlation. Our test can be conveniently implemented in practice as the limiting null distribution of the test statistic is shown to be standard normal. It exhibits excellent finite sample performance in our simulation studies even when the sample size is small albeit the dimension is high and is shown to identify non-linear dependence in empirical data analysis successfully. On the theory side, asymptotic normality of our test statistic is shown under quite mild moment assumptions and with little restriction on the growth rate of the dimension as a function of sample size. As a demonstration of good power properties for our distance-covariance-based test, we further show that an infeasible version of our test statistic has the rate optimality in the class of Gaussian distributions with equal correlation.

     
    more » « less
  4. Summary In this paper, we develop a systematic theory for high-dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new U-type statistic to test linear hypotheses and establish a high-dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be used to deal with the classical one-way multivariate analysis of variance, and the nonparametric one-way multivariate analysis of variance in high dimensions. To implement the test procedure, we introduce a sample-splitting-based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings. 
    more » « less
  5. Abstract

    Water availability influences all aspects of plant growth and development; however, most studies of plant responses to drought have focused on vegetative organs, notably roots and leaves. Far less is known about the molecular bases of drought acclimation responses in fruits, which are complex organs with distinct tissue types. To obtain a more comprehensive picture of the molecular mechanisms governing fruit development under drought, we profiled the transcriptomes of a spectrum of fruit tissues from tomato (Solanum lycopersicum), spanning early growth through ripening and collected from plants grown under varying intensities of water stress. In addition, we compared transcriptional changes in fruit with those in leaves to highlight different and conserved transcriptome signatures in vegetative and reproductive organs. We observed extensive and diverse genetic reprogramming in different fruit tissues and leaves, each associated with a unique response to drought acclimation. These included major transcriptional shifts in the placenta of growing fruit and in the seeds of ripe fruit related to cell growth and epigenetic regulation, respectively. Changes in metabolic and hormonal pathways, such as those related to starch, carotenoids, jasmonic acid, and ethylene metabolism, were associated with distinct fruit tissues and developmental stages. Gene coexpression network analysis provided further insights into the tissue-specific regulation of distinct responses to water stress. Our data highlight the spatiotemporal specificity of drought responses in tomato fruit and indicate known and unrevealed molecular regulatory mechanisms involved in drought acclimation, during both vegetative and reproductive stages of development.

     
    more » « less