skip to main content

Title: Inference of differential gene regulatory networks based on gene expression and genetic perturbation data
Abstract Motivation

Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy.

Results

In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint more » inference of two GRNs and identification of the differential GRN under two conditions.

Availability and implementation

The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN.

Supplementary information

Supplementary data are available at Bioinformatics online.

« less
Authors:
; ;  
Publication Date:
NSF-PAR ID:
10118492
Journal Name:
Bioinformatics
ISSN:
1367-4803
Publisher:
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Cowen, Lenore (Ed.)
    Abstract Summary Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphsmore »given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. Availability and implementation Python package freely available at http://uhlerlab.github.io/causaldag/dci. Supplementary information Supplementary data are available at Bioinformatics online.« less
  2. Abstract Summary

    Differential Expression Gene Explorer (DrEdGE) is a web-based tool that guides genomicists through easily creating interactive online data visualizations, which colleagues can query according to their own conditions to discover genes, samples or patterns of interest. We demonstrate DrEdGE’s features with three example websites generated from publicly available datasets—human neuronal tissue, mouse embryonic tissue and Caenorhabditis elegans whole embryos. DrEdGE increases the utility of large genomics datasets by removing technical obstacles to independent exploration.

    Availability and implementation

    Freely available at http://dredge.bio.unc.edu.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  3. Abstract Background

    Histone post-translational modifications (PTMs) play an important role in our system by regulating the structure of chromatin and therefore contribute to the regulation of gene and protein expression. Irregularities in histone PTMs can lead to a variety of different diseases including various forms of cancer. Histone modifications are analyzed using high resolution mass spectrometry, which generate large amounts of data that requires sophisticated bioinformatics tools for analysis and visualization. PTMViz is designed for downstream differential abundance analysis and visualization of both protein and/or histone modifications.

    Results

    PTMViz provides users with data tables and visualization plots of significantly differentiated proteins andmore »histone PTMs between two sample groups. All the data is packaged into interactive data tables and graphs using the Shiny platform to help the user explore the results in a fast and efficient manner to assess if changes in the system are due to protein abundance changes or epigenetic changes. In the example data provided, we identified several proteins differentially regulated in the dopaminergic pathway between mice treated with methamphetamine compared to a saline control. We also identified histone post-translational modifications including histone H3K9me, H3K27me3, H4K16ac, and that were regulated due to drug exposure.

    Conclusions

    Histone modifications play an integral role in the regulation of gene expression. PTMViz provides an interactive platform for analyzing proteins and histone post-translational modifications from mass spectrometry data in order to quickly identify differentially expressed proteins and PTMs.

    « less
  4. Abstract Motivation The inference of gene regulatory networks (GRNs) from DNA microarray measurements forms a core element of systems biology-based phenotyping. In the recent past, numerous computational methodologies have been formalized to enable the deduction of reliable and testable predictions in today’s biology. However, little focus has been aimed at quantifying how well existing state-of-the-art GRNs correspond to measured gene-expression profiles. Results Here, we present a computational framework that combines the formulation of probabilistic graphical modeling, standard statistical estimation, and integration of high-throughput biological data to explore the global behavior of biological systems and the global consistency between experimentally verifiedmore »GRNs and corresponding large microarray compendium data. The model is represented as a probabilistic bipartite graph, which can handle highly complex network systems and accommodates partial measurements of diverse biological entities, e.g. messengerRNAs, proteins, metabolites and various stimulators participating in regulatory networks. This method was tested on microarray expression data from the M3D database, corresponding to sub-networks on one of the best researched model organisms, Escherichia coli. Results show a surprisingly high correlation between the observed states and the inferred system’s behavior under various experimental conditions. Availability and implementation Processed data and software implementation using Matlab are freely available at https://github.com/kotiang54/PgmGRNs. Full dataset available from the M3D database.« less
  5. Abstract Background

    Binary classification rules based on a small-sample of high-dimensional data (for instance, gene expression data) are ubiquitous in modern bioinformatics. Constructing such classifiers is challenging due to (a) the complex nature of underlying biological traits, such as gene interactions, and (b) the need for highly interpretable glass-box models. We use the theory of high dimensional model representation (HDMR) to build interpretable low dimensional approximations of the log-likelihood ratio accounting for the effects of each individual gene as well as gene-gene interactions. We propose two algorithms approximating the second order HDMR expansion, and a hypothesis test based on the HDMRmore »formulation to identify significantly dysregulated pairwise interactions. The theory is seen as flexible and requiring only a mild set of assumptions.

    Results

    We apply our approach to gene expression data from both synthetic and real (breast and lung cancer) datasets comparing it also against several popular state-of-the-art methods. The analyses suggest the proposed algorithms can be used to obtain interpretable prediction rules with high prediction accuracies and to successfully extract significantly dysregulated gene-gene interactions from the data. They also compare favorably against their competitors across multiple synthetic data scenarios.

    Conclusion

    The proposed HDMR-based approach appears to produce a reliable classifier that additionally allows one to describe how individual genes or gene-gene interactions affect classification decisions. Both real and synthetic data analyses suggest that our methods can be used to identify gene networks with dysregulated pairwise interactions, and are therefore appropriate for differential networks analysis.

    « less