Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Free, publicly-accessible full text available June 4, 2024
Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states.
To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data.
scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
Free, publicly-accessible full text available February 1, 2024
For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov chain Monte Carlo and numerical integration. We present a Variational Bayesian Inference approach to BMA as a viable alternative to the standard solutions which avoids many of the aforementioned pitfalls. The proposed method is “black box” in the sense that it can be readily applied to many models with little to no model-specific derivation. We illustrate the utility of our variational approach on a suite of examples and discuss all the necessary implementation details. Fully documented Python code with all the examples is provided as well.more » « less
Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets.
To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets.
Availability and implementation
The scSGL code and analysis scripts are available on https://github.com/Single-Cell-Graph-Learning/scSGL.
Supplementary data are available at Bioinformatics online.
null (Ed.)The responses of plant photosynthesis to rapid fluctuations in environmental conditions are thought to be critical for efficient conversion of light energy. Such responses are not well represented under laboratory conditions, but have also been difficult to probe in complex field environments. We demonstrate an open science approach to this problem that combines multifaceted measurements of photosynthesis and environmental conditions, and an unsupervised statistical clustering approach. In a selected set of data on mint (Mentha sp.), we show that the “light potential” for increasing linear electron flow (LEF) and nonphotochemical quenching (NPQ) upon rapid light increases are strongly suppressed in leaves previously exposed to low ambient PAR or low leaf temperatures, factors that can act both independently and cooperatively. Further analyses allowed us to test specific mechanisms. With decreasing leaf temperature or PAR, limitations to photosynthesis during high light fluctuations shifted from rapidly-induced NPQ to photosynthetic control (PCON) of electron flow at the cytochrome b6f complex. At low temperatures, high light induced lumen acidification, but did not induce NPQ, leading to accumulation of reduced electron transfer intermediates, a situation likely to induce photodamage, and represents a potential target for improving the efficiency and robustness of photosynthesis. Finally, we discuss the implications of the approach for open science efforts to understand and improve crop productivity.more » « less
Multimodal data arise in various applications where information about the same phenomenon is acquired from multiple sensors and across different imaging modalities. Learning from multimodal data is of great interest in machine learning and statistics research as this offers the possibility of capturing complementary information among modalities. Multimodal modeling helps to explain the interdependence between heterogeneous data sources, discovers new insights that may not be available from a single modality, and improves decision‐making. Recently, coupled matrix–tensor factorization has been introduced for multimodal data fusion to jointly estimate latent factors and identify complex interdependence among the latent factors. However, most of the prior work on coupled matrix–tensor factors focuses on unsupervised learning and there is little work on supervised learning using the jointly estimated latent factors. This paper considers the multimodal tensor data classification problem. A coupled support tensor machine (C‐STM) built upon the latent factors jointly estimated from the advanced coupled matrix–tensor factorization is proposed. C‐STM combines individual and shared latent factors with multiple kernels and estimates a maximal‐margin classifier for coupled matrix–tensor data. The classification risk of C‐STM is shown to converge to the optimal Bayes risk, making it a statistically consistent rule. C‐STM is validated through simulation studies as well as a simultaneous analysis on electroencephalography with functional magnetic resonance imaging data. The empirical evidence shows that C‐STM can utilize information from multiple sources and provide a better classification performance than traditional single‐mode classifiers.
We develop a general non-parametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated non-parametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errors-in-variables problem, and using a deconvolution argument to access the distribution of the cluster mean. A Fourier deconvolution approach could be used if the distribution of the error-in-variables were known. In practice it is unknown, of course, but it can be estimated from repeated measurements, and in this way deconvolution can be achieved in an approximate sense. This argument might be interpreted as implying that large numbers of replicates are necessary for each cluster mean distribution, but that is not so; we avoid this requirement by incorporating statistical smoothing over values of nearby explanatory variables. Empirical rules are developed for the choice of smoothing parameter. Numerical simulations, and an application to real data, demonstrate small sample performance for this package of methodology. We also develop theory establishing statistical consistency.