skip to main content

Title: Kernelized multiview signed graph learning for single-cell RNA sequencing data
Abstract Background

Characterizing the topology of gene regulatory networks (GRNs) is a fundamental problem in systems biology. The advent of single cell technologies has made it possible to construct GRNs at finer resolutions than bulk and microarray datasets. However, cellular heterogeneity and sparsity of the single cell datasets render void the application of regular Gaussian assumptions for constructing GRNs. Additionally, most GRN reconstruction approaches estimate a single network for the entire data. This could cause potential loss of information when single cell datasets are generated from multiple treatment conditions/disease states.


To better characterize single cell GRNs under different but related conditions, we propose the joint estimation of multiple networks using multiple signed graph learning (scMSGL). The proposed method is based on recently developed graph signal processing (GSP) based graph learning, where GRNs and gene expressions are modeled as signed graphs and graph signals, respectively. scMSGL learns multiple GRNs by optimizing the total variation of gene expressions with respect to GRNs while ensuring that the learned GRNs are similar to each other through regularization with respect to a learned signed consensus graph. We further kernelize scMSGL with the kernel selected to suit the structure of single cell data.


scMSGL is shown to have superior performance over existing state of the art methods in GRN recovery on simulated datasets. Furthermore, scMSGL successfully identifies well-established regulators in a mouse embryonic stem cell differentiation study and a cancer clinical study of medulloblastoma.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
BMC Bioinformatics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Elucidating the topology of gene regulatory networks (GRNs) from large single-cell RNA sequencing datasets, while effectively capturing its inherent cell-cycle heterogeneity and dropouts, is currently one of the most pressing problems in computational systems biology. Recently, graph learning (GL) approaches based on graph signal processing have been developed to infer graph topology from signals defined on graphs. However, existing GL methods are not suitable for learning signed graphs, a characteristic feature of GRNs, which are capable of accounting for both activating and inhibitory relationships in the gene network. They are also incapable of handling high proportion of zero values present in the single cell datasets.


    To this end, we propose a novel signed GL approach, scSGL, that learns GRNs based on the assumption of smoothness and non-smoothness of gene expressions over activating and inhibitory edges, respectively. scSGL is then extended with kernels to account for non-linearity of co-expression and for effective handling of highly occurring zero values. The proposed approach is formulated as a non-convex optimization problem and solved using an efficient ADMM framework. Performance assessment using simulated datasets demonstrates the superior performance of kernelized scSGL over existing state of the art methods in GRN recovery. The performance of scSGL is further investigated using human and mouse embryonic datasets.

    Availability and implementation

    The scSGL code and analysis scripts are available on

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  2. Abstract Motivation

    Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell–cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model.


    We propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputting refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks.

    Availability and implementation

    The source code and data is available at

    more » « less
  3. Abstract Background

    A cell exhibits a variety of responses to internal and external cues. These responses are possible, in part, due to the presence of an elaborate gene regulatory network (GRN) in every single cell. In the past 20 years, many groups worked on reconstructing the topological structure of GRNs from large-scale gene expression data using a variety of inference algorithms. Insights gained about participating players in GRNs may ultimately lead to therapeutic benefits. Mutual information (MI) is a widely used metric within this inference/reconstruction pipeline as it can detect any correlation (linear and non-linear) between any number of variables (n-dimensions). However, the use of MI with continuous data (for example, normalized fluorescence intensity measurement of gene expression levels) is sensitive to data size, correlation strength and underlying distributions, and often requires laborious and, at times, ad hoc optimization.


    In this work, we first show that estimating MI of a bi- and tri-variate Gaussian distribution usingk-nearest neighbor (kNN) MI estimation results in significant error reduction as compared to commonly used methods based on fixed binning. Second, we demonstrate that implementing the MI-based kNN Kraskov–Stoögbauer–Grassberger (KSG) algorithm leads to a significant improvement in GRN reconstruction for popular inference algorithms, such as Context Likelihood of Relatedness (CLR). Finally, through extensive in-silico benchmarking we show that a new inference algorithm CMIA (Conditional Mutual Information Augmentation), inspired by CLR, in combination with the KSG-MI estimator, outperforms commonly used methods.


    Using three canonical datasets containing 15 synthetic networks, the newly developed method for GRN reconstruction—which combines CMIA, and the KSG-MI estimator—achieves an improvement of 20–35% in precision-recall measures over the current gold standard in the field. This new method will enable researchers to discover new gene interactions or better choose gene candidates for experimental validations.

    more » « less
  4. Inferring gene regulatory networks (GRNs) from single-cell RNA-seq (scRNA-seq) data is an important computational question to find regulatory mechanisms involved in fundamental cellular processes. Although many computational methods have been designed to predict GRNs from scRNA-seq data, they usually have high false positive rates and none infer GRNs by directly using the paired datasets of case-versus-control experiments. Here we present a novel deep-learning-based method, named scTIGER, for GRN detection by using the co-differential relationships of gene expression profiles in paired scRNA-seq datasets. scTIGER employs cell-type-based pseudotiming, an attention-based convolutional neural network method and permutation-based significance testing for inferring GRNs among gene modules. As state-of-the-art applications, we first applied scTIGER to scRNA-seq datasets of prostate cancer cells, and successfully identified the dynamic regulatory networks of AR, ERG, PTEN and ATF3 for same-cell type between prostatic cancerous and normal conditions, and two-cell types within the prostatic cancerous environment. We then applied scTIGER to scRNA-seq data from neurons with and without fear memory and detected specific regulatory networks for BDNF, CREB1 and MAPK4. Additionally, scTIGER demonstrates robustness against high levels of dropout noise in scRNA-seq data.

    more » « less
  5. Inferring gene regulatory networks (GRNs) from single-cell gene expression datasets is a challenging task. Existing methods are often designed heuristically for specific datasets and lack the flexibility to incorporate additional information or compare against other algorithms. Further, current GRN inference methods do not provide uncertainty estimates with respect to the interactions that they predict, making inferred networks challenging to interpret. To overcome these challenges, we introduce Probabilistic Matrix Factorization for Gene Regulatory Network inference (PMF-GRN). PMF-GRN uses single-cell gene expression data to learn latent factors representing transcription factor activity as well as regulatory relationships between transcription factors and their target genes. This approach incorporates available experimental evidence into prior distributions over latent factors and scales well to single-cell gene expression datasets. By utilizing variational inference, we facilitate hyperparameter search for principled model selection and direct comparison to other generative models. To assess the accuracy of our method, we evaluate PMF-GRN using the model organisms Saccharomyces cerevisiae and Bacillus subtilis, benchmarking against database-derived gold standard interactions. We discover that, on average, PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods. Moreover, our PMF-GRN approach offers well-calibrated uncertainty estimates, as it performs gene regulatory network (GRN) inference in a probabilistic setting. These estimates are valuable for validation purposes, particularly when validated interactions are limited or a gold standard is incomplete. 
    more » « less