- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings. The 11th ACM Int'l Conf on Bioinform, Comput Biol and Health Inform
- Page Range or eLocation-ID:
- Article No.: 57
- Sponsoring Org:
- National Science Foundation
More Like this
Inference of a combinatorial function from multiple independent variables (parents) to a dependent variable (child) in a discrete space can be useful in detecting nonlinear relationships in biological systems. Popular conditional independency measures, heavily used in combinatorial inference, are often insensitive to the direction of functional dependency. To address this issue, we define multivariate and conditional functional chi-squared statistics. We also present an algorithm called CFDF for bivariate discrete function inference via an exclusive-effect strategy, in order to identify a best parent set for a given child. It requires each parent to make sufficient contribution beyond any marginal effect. Simulation studies suggest a marked advantage of our framework over alternatives. Applying the method to transcriptome data in genetically perturbed biological systems, we reproduced combinatorial gene interactions known in the literature. Most importantly, we identified combinatorial patterns from joint RNA and protein data to rebut a dispute on the founding principle of molecular biology.
Kelso, Janet (Ed.)Abstract Motivation Genetic or epigenetic events can rewire molecular networks to induce extraordinary phenotypical divergences. Among the many network rewiring approaches, no model-free statistical methods can differentiate gene-gene pattern changes not attributed to marginal changes. This may obscure fundamental rewiring from superficial changes. Results Here we introduce a model-free Sharma-Song test to determine if patterns differ in the second order, meaning that the deviation of the joint distribution from the product of marginal distributions is unequal across conditions. We prove an asymptotic chi-squared null distribution for the test statistic. Simulation studies demonstrate its advantage over alternative methods in detecting second-order differential patterns. Applying the test on three independent mammalian developmental transcriptome datasets, we report a lower frequency of co-expression network rewiring between human and mouse for the same tissue group than the frequency of rewiring between tissue groups within the same species. We also find secondorder differential patterns between microRNA promoters and genes contrasting cerebellum and liver development in mice. These patterns are enriched in the spliceosome pathway regulating tissue specificity. Complementary to previous mammalian comparative studies mostly driven by first-order effects, our findings contribute an understanding of system-wide second-order gene network rewiring within and across mammalian systems. Second-order differentialmore »
The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene–gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains.
We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) frommore »
Availability and implementation
The SpaceX R-package is available at github.com/bayesrx/SpaceX.
Supplementary data are available at Bioinformatics online.
Computer inference of biological mechanisms is increasingly approachable due to dynamically rich data sources such as single-cell genomics. Inferred molecular interactions can prioritize hypotheses for wet-lab experiments to expedite biological discovery. However, complex data often come with unwanted biological or technical variations, exposing biases over marginal distribution and sample size in current methods to favor spurious causal relationships.
Considering function direction and strength as evidence for causality, we present an adapted functional chi-squared test (AdpFunChisq) that rewards functional patterns over non-functional or independent patterns. On synthetic and three biology datasets, we demonstrate the advantages of AdpFunChisq over 10 methods on overcoming biases that give rise to wide fluctuations in the performance of alternative approaches. On single-cell multiomics data of multiple phenotype acute leukemia, we found that the T-cell surface glycoprotein CD3 delta chain may causally mediate specific genes in the viral carcinogenesis pathway. Using the causality-by-functionality principle, AdpFunChisq offers a viable option for robust causal inference in dynamical systems.
Availability and implementation
The AdpFunChisq test is implemented in the R package ‘FunChisq’ (2.5.2 or above) at https://cran.r-project.org/package=FunChisq. All other source code along with pre-processed data is available at Code Ocean https://doi.org/10.24433/CO.2907738.v1
Supplementary materials are available at Bioinformatics online.
SUMMARY A fast algorithm for the large-scale joint inversion of gravity and magnetic data is developed. The algorithm uses a non-linear Gramian constraint to impose correlation between the density and susceptibility of the reconstructed models. The global objective function is formulated in the space of the weighted parameters, but the Gramian constraint is implemented in the original space, and the non-linear constraint is imposed using two separate Lagrange parameters, one for each model domain. It is significant that this combined approach, using the two spaces provides more similarity between the reconstructed models. Moreover, it is shown theoretically that the gradient for the use of the unweighted space is not a scalar multiple of that used for the weighted space, and hence cannot be accounted for by adjusting the Lagrange parameters. It is assumed that the measured data are obtained on a uniform grid and that a consistent regular discretization of the volume domain is imposed. Then, the sensitivity matrices exhibit a block-Toeplitz-Toeplitz-block structure for each depth layer of the model domain, and both forward and transpose operations with the matrices can be implemented efficiently using two dimensional fast Fourier transforms. This makes it feasible to solve for large scale problemsmore »