Abstract Background Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states. Results To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data. Conclusions scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.
more »
« less
scSGL: kernelized signed graph learning for single-cell gene regulatory network inference
Abstract Motivation Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets. Results To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets. Availability and implementation The scSGL code and analysis scripts are available on https://github.com/Single-Cell-Graph-Learning/scSGL. Supplementary information Supplementary data are available at Bioinformatics online.
more »
« less
- PAR ID:
- 10336749
- Editor(s):
- Boeva, Valentina
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 38
- Issue:
- 11
- ISSN:
- 1367-4803
- Page Range / eLocation ID:
- 3011 to 3019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Kuijjer, Marieke (Ed.)Abstract Motivation Biological processes are regulated by underlying genes and their interactions that form gene regulatory networks (GRNs). Dysregulation of these GRNs can cause complex diseases such as cancer, Alzheimer’s and diabetes. Hence, accurate GRN inference is critical for elucidating gene function, allowing for the faster identification and prioritization of candidate genes for functional investigation. Several statistical and machine learning-based methods have been developed to infer GRNs based on biological and synthetic datasets. Here, we developed a method named AGRN that infers GRNs by employing an ensemble of machine learning algorithms. Results From the idea that a single method may not perform well on all datasets, we calculate the gene importance scores using three machine learning methods—random forest, extra tree and support vector regressors. We calculate the importance scores from Shapley Additive Explanations, a recently published method to explain machine learning models. We have found that the importance scores from Shapley values perform better than the traditional importance scoring methods based on almost all the benchmark datasets. We have analyzed the performance of AGRN using the datasets from the DREAM4 and DREAM5 challenges for GRN inference. The proposed method, AGRN—an ensemble machine learning method with Shapley values, outperforms the existing methods both in the DREAM4 and DREAM5 datasets. With improved accuracy, we believe that AGRN inferred GRNs would enhance our mechanistic understanding of biological processes in health and disease. Availabilityand implementation https://github.com/DuaaAlawad/AGRN. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
Graph Signal Processing (GSP) offers a structured way to model and analyze complex data networks. However, a consistent challenge in real world applications is that the underlying graph topology is often unknown. While most existing graph learning methods focus on undirected graphs, specific relationships in network data, such as diffusion, weather data and social networks, sometimes necessitate a directed graph model. In this work, we propose a novel method for learning directed graph structures from observed signals by leveraging both signal smoothness and directional flow. We introduce a Dirichlet energy formulation specifically tailored to directed graphs, favoring smoothness and downwards flow of signals over the graph. We then develop the Perseus Measure to quantify how much the learned graphs obey the directional flow constraint. In addition, we present a novel random graph algorithm that builds a signal matrix alongside the graph, and we derive certain properties about the resulting Hierarchy Graphs. Experiments on both synthetic and real-world datasets show that our directed graph learning approach effectively captures directional relationships.more » « less
-
Signed networks (networks with positive and negative edges) commonly arise in various domains from molecular biology to social media. The edge signs -- i.e., the graph signage -- represent the interaction pattern between the vertices and can provide insights into the underlying system formation process. Generative models considering signage formation are essential for testing hypotheses about the emergence of interactions and for creating synthetic datasets for algorithm benchmarking (especially in areas where obtaining real-world datasets is difficult).In this work, we pose a novel Maximum-Likelihood-based optimization problem for modeling signages given their topology and showcase it in the context of gene regulation. Regulatory interactions of genes play a key role in the process of organism development, and when broken can lead to serious organism abnormalities and diseases. Our contributions are threefold: First, we design a new class of signage models for a given topology, and, based on the parameter setting, we discuss its biological interpretations for gene regulatory networks (GRNs). Second, we design algorithms computing the Maximum Likelihood -- depending on the parameter setting, our algorithms range from closed-form expressions to MCMC sampling. Third, we evaluated the results of our algorithms on synthetic datasets and real-world large GRNs. Our work can lead to the prediction of unknown gene regulations, novel biological hypotheses, and realistic benchmark datasets in the realm of gene regulation.more » « less
-
Abstract Motivation Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell–cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model. Results We propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputting refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks. Availability and implementation The source code and data is available at https://github.com/MihirBafna/CLARIFY.more » « less
An official website of the United States government

