skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Causal network inference from gene transcriptional time-series response to glucocorticoids
Gene regulatory network inference is essential to uncover complex relationships among gene pathways and inform downstream experiments, ultimately enabling regulatory network re-engineering. Network inference from transcriptional time-series data requires accurate, interpretable, and efficient determination of causal relationships among thousands of genes. Here, we develop Bootstrap Elastic net regression from Time Series (BETS), a statistical framework based on Granger causality for the recovery of a directed gene network from transcriptional time-series data. BETS uses elastic net regression and stability selection from bootstrapped samples to infer causal relationships among genes. BETS is highly parallelized, enabling efficient analysis of large transcriptional data sets. We show competitive accuracy on a community benchmark, the DREAM4 100-gene network inference challenge, where BETS is one of the fastest among methods of similar performance and additionally infers whether causal effects are activating or inhibitory. We apply BETS to transcriptional time-series data of differentially-expressed genes from A549 cells exposed to glucocorticoids over a period of 12 hours. We identify a network of 2768 genes and 31,945 directed edges (FDR ≤ 0.2). We validate inferred causal network edges using two external data sources: Overexpression experiments on the same glucocorticoid system, and genetic variants associated with inferred edges in primary lung tissue in the Genotype-Tissue Expression (GTEx) v6 project. BETS is available as an open source software package at https://github.com/lujonathanh/BETS .  more » « less
Award ID(s):
1750729
PAR ID:
10319988
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Editor(s):
Leslie, Christina S.
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
1
ISSN:
1553-7358
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cowen, Lenore (Ed.)
    Abstract Summary Designing interventions to control gene regulation necessitates modeling a gene regulatory network by a causal graph. Currently, large-scale gene expression datasets from different conditions, cell types, disease states, and developmental time points are being collected. However, application of classical causal inference algorithms to infer gene regulatory networks based on such data is still challenging, requiring high sample sizes and computational resources. Here, we describe an algorithm that efficiently learns the differences in gene regulatory mechanisms between different conditions. Our difference causal inference (DCI) algorithm infers changes (i.e. edges that appeared, disappeared, or changed weight) between two causal graphs given gene expression data from the two conditions. This algorithm is efficient in its use of samples and computation since it infers the differences between causal graphs directly without estimating each possibly large causal graph separately. We provide a user-friendly Python implementation of DCI and also enable the user to learn the most robust difference causal graph across different tuning parameters via stability selection. Finally, we show how to apply DCI to single-cell RNA-seq data from different conditions and cell states, and we also validate our algorithm by predicting the effects of interventions. Availability and implementation Python package freely available at http://uhlerlab.github.io/causaldag/dci. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. The angiosperm seed represents a critical evolutionary breakthrough that has been shown to propel the reproductive success and radiation of flowering plants. Seeds promote the rapid diversification of angiosperms by establishing postzygotic reproductive barriers, such as hybrid seed inviability. While prezygotic barriers to reproduction tend to be transient, postzygotic barriers are often permanent and therefore can play a pivotal role in facilitating speciation. This property of the angiosperm seed is exemplified in the Mimulus genus. In order to further the understanding of the gene regulatory mechanisms important in the Mimulus seed, we performed gene regulatory network (GRN) inference analysis by using time-series RNA-seq data from developing hybrid seeds from a viable cross between Mimulus guttatus and Mimulus pardalis. GRN inference has the capacity to identify active regulatory mechanisms in a sample and highlight genes of potential biological importance. In our case, GRN inference also provided the opportunity to uncover active regulatory relationships and generate a reference set of putative gene regulations. We deployed two GRN inference algorithms—RTP-STAR and KBoost—on three different subsets of our transcriptomic dataset. While the two algorithms yielded GRNs with different regulations and topologies when working with the same data subset, there was still significant overlap in the specific gene regulations they inferred, and they both identified potential novel regulatory mechanisms that warrant further investigation. 
    more » « less
  3. Abstract Background Cell and circadian cycles control a large fraction of cell and organismal physiology by regulating large periodic transcriptional programs that encompass anywhere from 15 to 80% of the genome despite performing distinct functions. In each case, these large periodic transcriptional programs are controlled by gene regulatory networks (GRNs), and it has been shown through genetics and chromosome mapping approaches in model systems that at the core of these GRNs are small sets of genes that drive the transcript dynamics of the GRNs. However, it is unlikely that we have identified all of these core genes, even in model organisms. Moreover, large periodic transcriptional programs controlling a variety of processes certainly exist in important non-model organisms where genetic approaches to identifying networks are expensive, time-consuming, or intractable. Ideally, the core network components could be identified using data-driven approaches on the transcriptome dynamics data already available. Results This study shows that a unified set of quantified dynamic features of high-throughput time series gene expression data are more prominent in the core transcriptional regulators of cell and circadian cycles than in their outputs, in multiple organism, even in the presence of external periodic stimuli. Additionally, we observe that the power to discriminate between core and non-core genes is largely insensitive to the particular choice of quantification of these features. Conclusions There are practical applications of the approach presented in this study for network inference, since the result is a ranking of genes that is enriched for core regulatory elements driving a periodic phenotype. In this way, the method provides a prioritization of follow-up genetic experiments. Furthermore, these findings reveal something unexpected—that there are shared dynamic features of the transcript abundance of core components of unrelated GRNs that control disparate periodic phenotypes. 
    more » « less
  4. Inferring gene regulatory networks (GRNs) from single-cell gene expression datasets is a challenging task. Existing methods are often designed heuristically for specific datasets and lack the flexibility to incorporate additional information or compare against other algorithms. Further, current GRN inference methods do not provide uncertainty estimates with respect to the interactions that they predict, making inferred networks challenging to interpret. To overcome these challenges, we introduce Probabilistic Matrix Factorization for Gene Regulatory Network inference (PMF-GRN). PMF-GRN uses single-cell gene expression data to learn latent factors representing transcription factor activity as well as regulatory relationships between transcription factors and their target genes. This approach incorporates available experimental evidence into prior distributions over latent factors and scales well to single-cell gene expression datasets. By utilizing variational inference, we facilitate hyperparameter search for principled model selection and direct comparison to other generative models. To assess the accuracy of our method, we evaluate PMF-GRN using the model organisms Saccharomyces cerevisiae and Bacillus subtilis, benchmarking against database-derived gold standard interactions. We discover that, on average, PMF-GRN infers GRNs more accurately than current state-of-the-art single-cell GRN inference methods. Moreover, our PMF-GRN approach offers well-calibrated uncertainty estimates, as it performs gene regulatory network (GRN) inference in a probabilistic setting. These estimates are valuable for validation purposes, particularly when validated interactions are limited or a gold standard is incomplete. 
    more » « less
  5. null (Ed.)
    We present PALLAS, a practical method for gene regulatory network (GRN) inference from time series data, which employs penalized maximum likelihood and particle swarms for optimization. PALLAS is based on the Partially-Observable Boolean Dynamical System (POBDS) model and thus does not require ad-hoc binarization of the data. The penalty in the likelihood is a LASSO regularization term, which encourages the resulting network to be sparse. PALLAS is able to scale to networks of realistic size under no prior knowledge, by virtue of a novel continuous-discrete Fish School Search particle swarm algorithm for efficient simultaneous maximization of the penalized likelihood over the discrete space of networks and the continuous space of observational parameters. The performance of PALLAS is demonstrated by a comprehensive set of experiments using synthetic data generated from real and artificial networks, as well as real time series microarray and RNA-seq data, where it is compared to several other well-known methods for gene regulatory network inference. The results show that PALLAS can infer GRNs more accurately than other methods, while being capable of working directly on gene expression data, without need of ad-hoc binarization. PALLAS is a fully-fledged program, written in python, and available on GitHub (https://github.com/yukuntan92/PALLAS). 
    more » « less