skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 31, 2026

Title: Counting subnetworks under gene duplication in genetic regulatory networks
Gene duplication is a fundamental evolutionary mechanism that contributes to biological complexity and diversity [6]. Traditionally, research has focused on the duplication of gene sequences [23]. However, evidence suggests that the duplication of regulatory elements may also play a significant role in the evolution of genomic functions [8, 21]. In this work the evolution of regulatory relationships belonging to gene-specific-substructures in a GRN are modeled. In the model, a network grows from an initial configuration by repeatedly choosing a random gene to duplicate. The likelihood that the regulatory relationships associated with the selected gene are retained through duplication is determined by a vector of probabilities. That is to say that each gene family has its own probability of retaining regulatory relationships. Occurrences of gene-family-specific substructures are counted under the gene duplication model. In this work gene-family-specific substructures are referred to as subnetwork motifs. These subnetwork motifs are motivated by network motifs which are patterns of interconnections that recur more often in a specialized network than in a random network [15]. Subnetwork motifs differ from network motifs in the way that subnetwork motifs are instances of gene-family-specific substructures while network motifs are isomorphic substructures. These subnetwork motifs are counted under Full and Partial Duplication, which differ in the way in which regulation relationships are inherited. Full duplication occurs when all regulatory links are inherited at each duplication step, and Partial Duplication occurs when regulation inheritance varies at each duplication step. Note that Full Duplication is just a special case of Partial Duplication. Moments for the number of occurrences of subnetwork motifs are determined in each model. In the end, the results presented offer a method for discovering gene-family-specific substructures that are significant in a GRN under gene duplication.  more » « less
Award ID(s):
2503759
PAR ID:
10656949
Author(s) / Creator(s):
; ;
Editor(s):
Simpson
Publisher / Repository:
Springer
Date Published:
Journal Name:
Bulletin of mathematical biology
ISSN:
1522-9602
Subject(s) / Keyword(s):
gene duplication subnetworks subnetwork motifs moments
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Inferring gene regulatory networks (GRNs) from single-cell gene expression datasets is a challenging task. Existing methods are often designed heuristically for specific datasets and lack the flexibility to incorporate additional information or compare against other algorithms. Further, current GRN inference methods do not provide uncertainty estimates with respect to the interactions that they predict, making inferred networks challenging to interpret. To overcome these challenges, we introduce Probabilistic Matrix Factorization for Gene Regulatory Network inference (PMF-GRN). PMF-GRN uses single-cell gene expression data to learn latent factors representing transcription factor activity as well as regulatory relationships between transcription factors and their target genes. This approach incorporates available experimental evidence into prior distributions over latent factors and scales well to single-cell gene expression datasets. By utilizing variational inference, we facilitate hyperparameter search for principled model selection and direct comparison to other generative models. To assess the accuracy of our method, we evaluate PMF-GRN using the model organisms Saccharomyces cerevisiae and Bacillus subtilis, benchmarking against database-derived gold standard interactions. We discover that, on average, PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods. Moreover, our PMF-GRN approach offers well-calibrated uncertainty estimates, as it performs gene regulatory network (GRN) inference in a probabilistic setting. These estimates are valuable for validation purposes, particularly when validated interactions are limited or a gold standard is incomplete. 
    more » « less
  2. Abstract Single cell profiling techniques including multi-omics and spatial-omics technologies allow researchers to study cell-cell variation within a cell population. These variations extend to biological networks within cells, in particular, the gene regulatory networks (GRNs). GRNs rewire as the cells evolve, and different cells can have different governing GRNs. However, existing GRN inference methods usually infer a single GRN for a population of cells, without exploring the cell-cell variation in terms of their regulatory mechanisms. Recently, jointly profiled single cell transcriptomics and chromatin accessibility data have been used to infer GRNs. Although methods based on such multi-omics data were shown to improve over the accuracy of methods using only single cell RNA-seq (scRNA-seq) data, they do not take full advantage of the single cell resolution chromatin accessibility data. We propose CeSpGRN (CellSpecificGeneRegulatoryNetwork inference), which infers cell-specific GRNs from scRNA-seq, single cell multi-omics, or single cell spatial-omics data. CeSpGRN uses a Gaussian weighted kernel that allows the GRN of a given cell to be learned from the sequencing profile of itself and its neighboring cells in the developmental process. The kernel is constructed from the similarity of gene expressions or spatial locations between cells. When the chromatin accessibility data is available, CeSpGRN constructs cell-specific prior networks which are used to further improve the inference accuracy. We applied CeSpGRN to various types of real-world datasets and inferred various regulation changes that were shown to be important in cell development. We also quantitatively measured the performance of CeSpGRN on simulated datasets and compared with baseline methods. The results show that CeSpGRN has a superior performance in reconstructing the GRN for each cell, as well as in detecting the regulatory interactions that differ between cells. CeSpGRN is available athttps://github.com/PeterZZQ/CeSpGRN. 
    more » « less
  3. The angiosperm seed represents a critical evolutionary breakthrough that has been shown to propel the reproductive success and radiation of flowering plants. Seeds promote the rapid diversification of angiosperms by establishing postzygotic reproductive barriers, such as hybrid seed inviability. While prezygotic barriers to reproduction tend to be transient, postzygotic barriers are often permanent and therefore can play a pivotal role in facilitating speciation. This property of the angiosperm seed is exemplified in the Mimulus genus. In order to further the understanding of the gene regulatory mechanisms important in the Mimulus seed, we performed gene regulatory network (GRN) inference analysis by using time-series RNA-seq data from developing hybrid seeds from a viable cross between Mimulus guttatus and Mimulus pardalis. GRN inference has the capacity to identify active regulatory mechanisms in a sample and highlight genes of potential biological importance. In our case, GRN inference also provided the opportunity to uncover active regulatory relationships and generate a reference set of putative gene regulations. We deployed two GRN inference algorithms—RTP-STAR and KBoost—on three different subsets of our transcriptomic dataset. While the two algorithms yielded GRNs with different regulations and topologies when working with the same data subset, there was still significant overlap in the specific gene regulations they inferred, and they both identified potential novel regulatory mechanisms that warrant further investigation. 
    more » « less
  4. Abstract Inferring gene regulatory networks (GRNs) from single-cell data is challenging due to heuristic limitations. Existing methods also lack estimates of uncertainty. Here we present Probabilistic Matrix Factorization for Gene Regulatory Network Inference (PMF-GRN). Using single-cell expression data, PMF-GRN infers latent factors capturing transcription factor activity and regulatory relationships. Using variational inference allows hyperparameter search for principled model selection and direct comparison to other generative models. We extensively test and benchmark our method using real single-cell datasets and synthetic data. We show that PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods, offering well-calibrated uncertainty estimates. 
    more » « less
  5. Kopp, Artyom (Ed.)
    Animal traits develop through the expression and action of numerous regulatory and realizator genes that comprise a gene regulatory network (GRN). For each GRN, its underlying patterns of gene expression are controlled by cis -regulatory elements (CREs) that bind activating and repressing transcription factors. These interactions drive cell-type and developmental stage-specific transcriptional activation or repression. Most GRNs remain incompletely mapped, and a major barrier to this daunting task is CRE identification. Here, we used an in silico method to identify predicted CREs (pCREs) that comprise the GRN which governs sex-specific pigmentation of Drosophila melanogaster . Through in vivo assays, we demonstrate that many pCREs activate expression in the correct cell-type and developmental stage. We employed genome editing to demonstrate that two CREs control the pupal abdomen expression of trithorax , whose function is required for the dimorphic phenotype. Surprisingly, trithorax had no detectable effect on this GRN’s key trans -regulators, but shapes the sex-specific expression of two realizator genes. Comparison of sequences orthologous to these CREs supports an evolutionary scenario where these trithorax CREs predated the origin of the dimorphic trait. Collectively, this study demonstrates how in silico approaches can shed novel insights on the GRN basis for a trait’s development and evolution. 
    more » « less