skip to main content

Title: Identifying robust functional modules using three-body correlations in Escherichia coli

Understanding the underlying structure of a gene regulatory network is crucial to understand the biological functions of genes or groups of genes. A common strategy to investigate it is to find community structure of these networks. However, methods of finding these communities are often sensitive to noise in the gene expression data and the inherent stochasticity of the community detection algorithms. Here we introduce an approach for identifying functional groups and their hierarchical organization in gene co-expression networks from expression data. A network describing the relatedness in the expression profiles of genes is first inferred using an information theoretic approach. Community structure within the inferred network is found by usingmodularity maximization. This community structure is further refined using three-body structural correlations to robustly identify important functional gene communities. We apply this approach to the expression data ofE. coligenes and identify 25 robust groups, many of which show key associations with important biological functions as demonstrated by gene ontology term enrichment analysis. Thus, our approach makes specific and novel predictions about the function of these genes.

more » « less
Author(s) / Creator(s):
; ;
Publisher / Repository:
IOP Publishing
Date Published:
Journal Name:
Journal of Physics: Complexity
Page Range / eLocation ID:
Article No. 015013
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Fu, Feng (Ed.)

    With the recent availability of tissue-specific gene expression data, e.g., provided by the GTEx Consortium, there is interest in comparing gene co-expression patterns across tissues. One promising approach to this problem is to use a multilayer network analysis framework and perform multilayer community detection. Communities in gene co-expression networks reveal groups of genes similarly expressed across individuals, potentially involved in related biological processes responding to specific environmental stimuli or sharing common regulatory variations. We construct a multilayer network in which each of the four layers is an exocrine gland tissue-specific gene co-expression network. We develop methods for multilayer community detection with correlation matrix input and an appropriate null model. Our correlation matrix input method identifies five groups of genes that are similarly co-expressed in multiple tissues (a community that spans multiple layers, which we call a generalist community) and two groups of genes that are co-expressed in just one tissue (a community that lies primarily within just one layer, which we call a specialist community). We further found gene co-expression communities where the genes physically cluster across the genome significantly more than expected by chance (on chromosomes 1 and 11). This clustering hints at underlying regulatory elements determining similar expression patterns across individuals and cell types. We suggest thatKRTAP3-1,KRTAP3-3, andKRTAP3-5share regulatory elements in skin and pancreas. Furthermore, we find thatCELA3AandCELA3Bshare associated expression quantitative trait loci in the pancreas. The results indicate that our multilayer community detection method for correlation matrix input extracts biologically interesting communities of genes.

    more » « less
  2. Abstract

    Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we used a recently developed algorithm, partitioned local depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting that a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.

    more » « less
  3. Abstract

    Identifying genes that interact to confer a biological function to an organism is one of the main goals of functional genomics. High‐throughput technologies for assessment and quantification of genome‐wide gene expression patterns have enabled systems‐level analyses to infer pathways or networks of genes involved in different functions under many different conditions. Here, we leveraged the publicly available, information‐rich RNA‐Seq datasets of the model plantArabidopsis thalianato construct a gene co‐expression network, which was partitioned into clusters or modules that harbor genes correlated by expression. Gene ontology and pathway enrichment analyses were performed to assess functional terms and pathways that were enriched within the different gene modules. By interrogating the co‐expression network for genes in different modules that associate with a gene of interest, diverse functional roles of the gene can be deciphered. By mapping genes differentially expressing under a certain condition inArabidopsisonto the co‐expression network, we demonstrate the ability of the network to uncover novel genes that are likely transcriptionally active but prone to be missed by standard statistical approaches due to their falling outside of the confidence zone of detection. To our knowledge, this is the firstA. thalianaco‐expression network constructed using the entire mRNA‐Seq datasets (>20,000) available at the NCBI SRA database. The developed network can serve as a useful resource for theArabidopsisresearch community to interrogate specific genes of interest within the network, retrieve the respective interactomes, decipher gene modules that are transcriptionally altered under certain condition or stage, and gain understanding of gene functions.

    more » « less
  4. Summary

    Predicting gene regulatory networks (GRNs) from expression profiles is a common approach for identifying important biological regulators. Despite the increased use of inference methods, existing computational approaches often do not integrate RNA‐sequencing data analysis, are not automated or are restricted to users with bioinformatics backgrounds. To address these limitations, we developedtuxnet, a user‐friendly platform that can process raw RNA‐sequencing data from any organism with an existing reference genome using a modifiedtuxedopipeline (hisat 2 + cufflinkspackage) and infer GRNs from these processed data.tuxnetis implemented as a graphical user interface and can mine gene regulations, either by applying a dynamic Bayesian network (DBN) inference algorithm,genist, or a regression tree‐based pipeline,rtp‐star. We obtained time‐course expression data of aPERIANTHIA(PAN) inducible line and inferred a GRN usinggenistto illustrate the use oftuxnetwhile gaining insight into the regulations downstream of the Arabidopsis root stem cell regulatorPAN. Usingrtp‐star, we inferred the network ofATHB13, a downstream gene of PAN, for which we obtained wild‐type and mutant expression profiles. Additionally, we generated two networks using temporal data from developmental leaf data and spatial data from root cell‐type data to highlight the use oftuxnetto form new testable hypotheses from previously explored data. Our case studies feature the versatility oftuxnetwhen using different types of gene expression data to infer networks and its accessibility as a pipeline for non‐bioinformaticians to analyze transcriptome data, predict causal regulations, assess network topology and identify key regulators.

    more » « less
  5. Finding the network biomarkers of cancers and the analysis of cancer driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly known as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. In this study, RNA-seq gene expression data of three cancers - Breast Invasive Carcinoma (BRCA), Colorectal Adenocarcinoma (COAD) and Glioblastoma Multiforme (GBM) - from The Cancer Genome Atlas (TCGA) are used to construct gene co-expression networks using Pearson Correlation. Six well-known community detection algorithms are applied on these networks to identify communities with five or more genes. A permutation test is performed to further mine the communities that are conserved in other cancers, thus calling them conserved communities. Then survival analysis is performed on clinical data of three cancers using the conserved community genes as prognostic co-variates. The communities that could distinguish the cancer patients between high- and low-risk groups are considered as cancer biomarkers. In the present study, 16 such network biomarkers are discovered. 
    more » « less