Inferring gene regulatory networks (GRNs) from single-cell gene expression datasets is a challenging task. Existing methods are often designed heuristically for specific datasets and lack the flexibility to incorporate additional information or compare against other algorithms. Further, current GRN inference methods do not provide uncertainty estimates with respect to the interactions that they predict, making inferred networks challenging to interpret. To overcome these challenges, we introduce Probabilistic Matrix Factorization for Gene Regulatory Network inference (PMF-GRN). PMF-GRN uses single-cell gene expression data to learn latent factors representing transcription factor activity as well as regulatory relationships between transcription factors and their target genes. This approach incorporates available experimental evidence into prior distributions over latent factors and scales well to single-cell gene expression datasets. By utilizing variational inference, we facilitate hyperparameter search for principled model selection and direct comparison to other generative models. To assess the accuracy of our method, we evaluate PMF-GRN using the model organisms Saccharomyces cerevisiae and Bacillus subtilis, benchmarking against database-derived gold standard interactions. We discover that, on average, PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods. Moreover, our PMF-GRN approach offers well-calibrated uncertainty estimates, as it performs gene regulatory network (GRN) inference in a probabilistic setting. These estimates are valuable for validation purposes, particularly when validated interactions are limited or a gold standard is incomplete.
more »
« less
Enhanced Graph Representation Convolution: Effective Inferring Gene Regulatory Network Using Graph Convolution Network with Self-Attention Graph Pooling Layer
Studying gene regulatory networks (GRNs) is paramount for unraveling the complexities of biological processes and their associated disorders, such as diabetes, cancer, and Alzheimer’s disease. Recent advancements in computational biology have aimed to enhance the inference of GRNs from gene expression data, a non-trivial task given the networks’ intricate nature. The challenge lies in accurately identifying the myriad interactions among transcription factors and target genes, which govern cellular functions. This research introduces a cutting-edge technique, EGRC (Effective GRN Inference applying Graph Convolution with Self-Attention Graph Pooling), which innovatively conceptualizes GRN reconstruction as a graph classification problem, where the task is to discern the links within subgraphs that encapsulate pairs of nodes. By leveraging Spearman’s correlation, we generate potential subgraphs that bring nonlinear associations between transcription factors and their targets to light. We use mutual information to enhance this, capturing a broader spectrum of gene interactions. Our methodology bifurcates these subgraphs into ‘Positive’ and ‘Negative’ categories. ‘Positive’ subgraphs are those where a transcription factor and its target gene are connected, including interactions among their neighbors. ‘Negative’ subgraphs, conversely, denote pairs without a direct connection. EGRC utilizes dual graph convolution network (GCN) models that exploit node attributes from gene expression profiles and graph embedding techniques to classify these. The performance of EGRC is substantiated by comprehensive evaluations using the DREAM5 datasets. Notably, EGRC attained an AUROC of 0.856 and an AUPR of 0.841 on the E. coli dataset. In contrast, the in silico dataset achieved an AUROC of 0.5058 and an AUPR of 0.958. Furthermore, on the S. cerevisiae dataset, EGRC recorded an AUROC of 0.823 and an AUPR of 0.822. These results underscore the robustness of EGRC in accurately inferring GRNs across various organisms. The advanced performance of EGRC represents a substantial advancement in the field, promising to deepen our comprehension of the intricate biological processes and their implications in both health and disease.
more »
« less
- Award ID(s):
- 2019745
- PAR ID:
- 10597758
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Machine Learning and Knowledge Extraction
- Volume:
- 6
- Issue:
- 3
- ISSN:
- 2504-4990
- Page Range / eLocation ID:
- 1818 to 1839
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates.more » « less
-
Gene regulatory networks (GRNs) govern gene expression and cellular identity, but accurately inferring their structure from high-dimensional single-cell RNA sequencing (scRNA-seq) data remains a major challenge. Here, we present EnsembleRegNet, a deep learning framework that infers transcription factor (TF)-target gene relationships by integrating an ensemble encoder-decoder and multilayer perceptron (MLP) architecture. EnsembleRegNet utilizes Hodges-Lehmann estimator (HLE)-based binarization, case-deletion analysis, motif enrichment using RcisTarget, and regulon activity scoring with AUCell to enhance both robustness and biological interpretability. Extensive evaluations across simulated and real scRNA-seq datasets demonstrate that EnsembleRegNet outperforms existing GRN inference methods, including SCENIC and SIGNET, in both clustering performance and regulatory accuracy. By uncovering cell-type-specific regulatory modules and enhancing interpretability, EnsembleRegNet offers a scalable and biologically grounded framework for exploring transcriptional regulation. Its demonstrated performance establishes a new benchmark for GRN inference and highlights its promise for applications in disease modeling, biomarker discovery, and cellular reprogramming.more » « less
-
Accurately inferring gene regulatory networks (GRNs) from single‐cell RNA sequencing (scRNA‐seq) data is critical for understanding cellular dynamics in both normal development and disease. However, existing computational methods often suffer from low precision and high false‐positive rates due to the intrinsic noise and complex regulatory architecture in scRNA‐seq data. We introduce scTIGER2.0, a deep‐learning‐based framework that integrates expression correlation, pseudotime ordering, temporal causal discovery, and bootstrap‐based significance testing to infer high‐confidence, directional gene–gene interactions. Benchmarking against five popular GRN inference methods using large‐scale datasets, scTIGER2.0 consistently achieved superior specificity, especially in linear developmental trajectories. In real applications, scTIGER2.0 identified an APOE‐centered GRN from Alzheimer's disease scRNA‐seq data and uncovered interconnected GRNs for FOS, FOXP1, JUN, KLF6, NCOA4, and RUNX1 from acute myeloid leukemia data, where 87.5% of the predicted targets show promoter‐binding peaks in the corresponding ChIP‐seq data. These results demonstrate that scTIGER2.0 is a robust, accurate and fully integrated platform for uncovering biologically meaningful GRNs from noisy scRNA‐seq data.more » « less
-
Abstract Motivation Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell–cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model. Results We propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputting refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks. Availability and implementation The source code and data is available at https://github.com/MihirBafna/CLARIFY.more » « less
An official website of the United States government

