skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recent advances in gene function prediction using context-specific coexpression networks in plants
Predicting gene functions from genome sequence alone has been difficult, and the functions of a large fraction of plant genes remain unknown. However, leveraging the vast amount of currently available gene expression data has the potential to facilitate our understanding of plant gene functions, especially in determining complex traits. Gene coexpression networks—created by integrating multiple expression datasets—connect genes with similar patterns of expression across multiple conditions. Dense gene communities in such networks, commonly referred to as modules, often indicate that the member genes are functionally related. As such, these modules serve as tools for generating new testable hypotheses, including the prediction of gene function and importance. Recently, we have seen a paradigm shift from the traditional “global” to more defined, context-specific coexpression networks. Such coexpression networks imply genetic correlations in specific biological contexts such as during development or in response to a stress. In this short review, we highlight a few recent studies that attempt to fill the large gaps in our knowledge about cellular functions of plant genes using context-specific coexpression networks.  more » « less
Award ID(s):
1826836 1716844
PAR ID:
10100002
Author(s) / Creator(s):
;
Date Published:
Journal Name:
F1000Research
Volume:
8
ISSN:
2046-1402
Page Range / eLocation ID:
153
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Epistasis is caused by genetic interactions among mutations that affect fitness. To characterize properties and potential mechanisms of epistasis, we engineered eight double mutants that combined mutations from the rho and rpoB genes of Escherichia coli. The two genes encode essential functions for transcription, and the mutations in each gene were chosen because they were beneficial for adaptation to thermal stress (42.2 °C). The double mutants exhibited patterns of fitness epistasis that included diminishing returns epistasis at 42.2 °C, stronger diminishing returns between mutations with larger beneficial effects and both negative and positive (sign) epistasis across environments (20.0 °C and 37.0 °C). By assessing gene expression between single and double mutants, we detected hundreds of genes with gene expression epistasis. Previous work postulated that highly connected hub genes in coexpression networks have low epistasis, but we found the opposite: hub genes had high epistasis values in both coexpression and protein–protein interaction networks. We hypothesized that elevated epistasis in hub genes reflected that they were enriched for targets of Rho termination but that was not the case. Altogether, gene expression and coexpression analyses revealed that thermal adaptation occurred in modules, through modulation of ribonucleotide biosynthetic processes and ribosome assembly, the attenuation of expression in genes related to heat shock and stress responses, and with an overall trend toward restoring gene expression toward the unstressed state. 
    more » « less
  2. Abstract The regulation of gene expression is central to many biological processes. Gene regulatory networks (GRNs) link transcription factors (TFs) to their target genes and represent maps of potential transcriptional regulation. Here, we analyzed a large number of publically available maize (Zea mays) transcriptome data sets including >6000 RNA sequencing samples to generate 45 coexpression-based GRNs that represent potential regulatory relationships between TFs and other genes in different populations of samples (cross-tissue, cross-genotype, and tissue-and-genotype samples). While these networks are all enriched for biologically relevant interactions, different networks capture distinct TF-target associations and biological processes. By examining the power of our coexpression-based GRNs to accurately predict covarying TF-target relationships in natural variation data sets, we found that presence/absence changes rather than quantitative changes in TF gene expression are more likely associated with changes in target gene expression. Integrating information from our TF-target predictions and previous expression quantitative trait loci (eQTL) mapping results provided support for 68 TFs underlying 74 previously identified trans-eQTL hotspots spanning a variety of metabolic pathways. This study highlights the utility of developing multiple GRNs within a species to detect putative regulators of important plant pathways and provides potential targets for breeding or biotechnological applications. 
    more » « less
  3. Abstract Background Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. Results We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Conclusion Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies. 
    more » « less
  4. This beginner’s guide is intended for plant biologists new to network analysis. Here, we introduce key concepts and resources for researchers interested in incorporating network analysis into research, either as a stand-alone component for generating hypotheses or as a framework for examining and visualizing experimental results. Network analysis provides a powerful tool to predict gene functions. Advances in and reduced costs for systems biology techniques, such as genomics, transcriptomics, and proteomics, have generated abundant -omics data for plants; however, the functional annotation of plant genes lags. Therefore, predictions from network analysis can be a starting point to annotate genes and ultimately elucidate genotype-phenotype relationships. In this paper, we introduce networks and compare network-building resources available for plant biologists, including databases and software for network analysis. We then compare four databases available for plant biologists in more detail: AraNet, GeneMANIA, ATTED-II, and STRING. AraNet, and GeneMANIA are functional association networks, ATTED-II is a gene coexpression database, and STRING is a protein-protein interaction database. AraNet, and ATTED-II are plant-specific databases that can analyze multiple plant species, whereas GeneMANIA builds networks for Arabidopsis thaliana and non-plant species, and STRING for multiple species. Finally, we compare the performance of the four databases in predicting known and probable gene functions of the A. thaliana Nuclear Factor-Y (NF-Y) genes. We conclude that plant biologists have an invaluable resource in these databases and discuss how users can decide which type of database to use depending on their research question. 
    more » « less
  5. Abstract Renal cell carcinoma (RCC) subtypes are characterized by distinct molecular profiles. Using RNA expression profiles from 1,009 RCC samples, we constructed a condition-annotated gene coexpression network (GCN). The RCC GCN contains binary gene coexpression relationships (edges) specific to conditions including RCC subtype and tumor stage. As an application of this resource, we discovered RCC GCN edges and modules that were associated with genetic lesions in known RCC driver genes, including VHL, a common initiating clear cell RCC (ccRCC) genetic lesion, and PBRM1 and BAP1 which are early genetic lesions in the Braided Cancer River Model (BCRM). Since ccRCC tumors with PBRM1 mutations respond to targeted therapy differently than tumors with BAP1 mutations, we focused on ccRCC-specific edges associated with tumors that exhibit alternate mutation profiles: VHL-PBRM1 or VHL-BAP1. We found specific blends molecular functions associated with these two mutation paths. Despite these mutation-associated edges having unique genes, they were enriched for the same immunological functions suggesting a convergent functional role for alternate gene sets consistent with the BCRM. The condition annotated RCC GCN described herein is a novel data mining resource for the assignment of polygenic biomarkers and their relationships to RCC tumors with specific molecular and mutational profiles. 
    more » « less