Abstract BackgroundCells progressing from an early state to a developed state give rise to lineages in cell differentiation. Knowledge of these lineages is central to developmental biology. Each biological lineage corresponds to a trajectory in a dynamical system. Emerging single-cell technologies such as single-cell RNA sequencing can capture molecular abundance in diverse cell types in a developing tissue. Many computational methods have been developed to infer trajectories from single-cell data. However, to our knowledge, none of the existing methods address the problem of determining the existence of a trajectory in observed data before attempting trajectory inference. ResultsWe introduce a method to identify the existence of a trajectory using three graph-based statistics. A permutation test is utilized to calculate the empirical distribution of the test statistic under the null hypothesis that a trajectory does not exist. Finally, ap-value is calculated to quantify the statistical significance for the presence of trajectory in the data. ConclusionsOur work contributes new statistics to assess the level of uncertainty in trajectory inference to increase the understanding of biological system dynamics.
more »
« less
Inference of trajectory presence by tree dimension and subset specificity by subtree cover
The complexity of biological processes such as cell differentiation is reflected in dynamic transitions between cellular states. Trajectory inference arranges the states into a progression using methodologies propelled by single-cell biology. However, current methods, all returning a best trajectory, do not adequately assess statistical significance of noisy patterns, leading to uncertainty in inferred trajectories. We introduce a tree dimension test for trajectory presence in multivariate data by a dimension measure of Euclidean minimum spanning tree, a test statistic, and a null distribution. Computable in linear time to tree size, the tree dimension measure summarizes the extent of branching more effectively than globally insensitive number of leaves or tree diameter indifferent to secondary branches. The test statistic quantifies trajectory presence and its null distribution is estimated under the null hypothesis of no trajectory in data. On simulated and real single-cell datasets, the test outperformed the intuitive number of leaves and tree diameter statistics. Next, we developed a measure for the tissue specificity of the dynamics of a subset, based on the minimum subtree cover of the subset in a minimum spanning tree. We found that tissue specificity of pathway gene expression dynamics is conserved in human and mouse development: several signal transduction pathways including calcium and Wnt signaling are most tissue specific, while genetic information processing pathways such as ribosome and mismatch repair are least so. Neither the tree dimension test nor the subset specificity measure has any user parameter to tune. Our work opens a window to prioritize cellular dynamics and pathways in development and other multivariate dynamical systems.
more »
« less
- Award ID(s):
- 1661331
- PAR ID:
- 10334204
- Editor(s):
- Iakoucheva, Lilia M.
- Date Published:
- Journal Name:
- PLOS Computational Biology
- Volume:
- 18
- Issue:
- 2
- ISSN:
- 1553-7358
- Page Range / eLocation ID:
- e1009829
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
High-throughput microfluidics-based assays can potentially increase the speed and quality of yeast replicative lifespan measurements. One major challenge is to efficiently convert large volumes of time-lapse images into quantitative measurements of cellular lifespans. Here, we address this challenge by prototyping an algorithm that can track cellular division events through family trees of cells. We generated a null distribution using single cells inside microfluidic traps. Based on this null distribution, we prototyped a maximum likelihood algorithm for cell tracking between images at different time-points. We inferred cell family trees through a likelihood based trace-back method. The branching patterns of the cell family trees are then used to infer replicative lifespan of the yeast mother cells. The longest branch of a cell family tree represents the full trajectory of a yeast mother cell. The replicative lifespan of this mother cell can be counted as the number of bifurcating branches of this family tree. In addition, we prototyped a different approach based on summing cells area which improved the replicative lifespan estimation significantly. These generic methods have the potential to accelerate the efficiency and expand the range of quantitative measurement of yeast replicative aging experiments.more » « less
-
Abstract Water availability influences all aspects of plant growth and development; however, most studies of plant responses to drought have focused on vegetative organs, notably roots and leaves. Far less is known about the molecular bases of drought acclimation responses in fruits, which are complex organs with distinct tissue types. To obtain a more comprehensive picture of the molecular mechanisms governing fruit development under drought, we profiled the transcriptomes of a spectrum of fruit tissues from tomato (Solanum lycopersicum), spanning early growth through ripening and collected from plants grown under varying intensities of water stress. In addition, we compared transcriptional changes in fruit with those in leaves to highlight different and conserved transcriptome signatures in vegetative and reproductive organs. We observed extensive and diverse genetic reprogramming in different fruit tissues and leaves, each associated with a unique response to drought acclimation. These included major transcriptional shifts in the placenta of growing fruit and in the seeds of ripe fruit related to cell growth and epigenetic regulation, respectively. Changes in metabolic and hormonal pathways, such as those related to starch, carotenoids, jasmonic acid, and ethylene metabolism, were associated with distinct fruit tissues and developmental stages. Gene coexpression network analysis provided further insights into the tissue-specific regulation of distinct responses to water stress. Our data highlight the spatiotemporal specificity of drought responses in tomato fruit and indicate known and unrevealed molecular regulatory mechanisms involved in drought acclimation, during both vegetative and reproductive stages of development.more » « less
-
Summary In this paper, we develop a systematic theory for high-dimensional analysis of variance in multivariate linear regression, where the dimension and the number of coefficients can both grow with the sample size. We propose a new U-type statistic to test linear hypotheses and establish a high-dimensional Gaussian approximation result under fairly mild moment assumptions. Our general framework and theory can be used to deal with the classical one-way multivariate analysis of variance, and the nonparametric one-way multivariate analysis of variance in high dimensions. To implement the test procedure, we introduce a sample-splitting-based estimator of the second moment of the error covariance and discuss its properties. A simulation study shows that our proposed test outperforms some existing tests in various settings.more » « less
-
Kelso, Janet (Ed.)Abstract Motivation Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. Results Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find secondorder differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differential patterns constitute evidence for fundamentally rewired biological circuitry due to evolution, environment, or disease. Availability The generic Sharma-Song test is available from the R package ‘DiffXTables’ at https://cran.r-project.org/package=DiffXTables. Other code and data are described in Methods. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
An official website of the United States government

