skip to main content


Title: MRHCA: a nonparametric statistics based method for hub and co‐expression module identification in large gene co‐expression network
Background

Gene co‐expression and differential co‐expression analysis has been increasingly used to study co‐functional and co‐regulatory biological mechanisms from large scale transcriptomics data sets.

Methods

In this study, we develop a nonparametric approach to identify hub genes and modules in a large co‐expression network with low computational and memory cost, namely MRHCA.

Results

We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co‐expression modules. With applying MRHCA and differential co‐expression analysis toE. coliand TCGA cancer data, we have identified significant condition specific activated genes inE. coliand distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations.

Conclusion

Our analysis has demonstrated MRHCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co‐expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.

 
more » « less
NSF-PAR ID:
10478003
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Quantitative Biology
Volume:
6
Issue:
1
ISSN:
2095-4689
Format(s):
Medium: X Size: p. 40-55
Size(s):
["p. 40-55"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene–gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains.

    Results

    We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) from the tumor region for the breast cancer data.

    Availability and implementation

    The SpaceX R-package is available at github.com/bayesrx/SpaceX.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Tumor tissue samples often contain an unknown fraction of stromal cells. This problem is widely known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor- and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose Tumor Specific Net (TSNet), a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample.

    Results

    Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model which does not account for TPH. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells.

    Availability and implementation

    R codes can be found at https://github.com/petraf01/TSNet.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Webster, Matthew (Ed.)
    Abstract

    The evolution of eusociality requires that individuals forgo some or all their own reproduction to assist the reproduction of others in their group, such as a primary egg-laying queen. A major open question is how genes and genetic pathways sculpt the evolution of eusociality, especially in rudimentary forms of sociality—those with smaller cooperative nests when compared with species such as honeybees that possess large societies. We lack comprehensive comparative studies examining shared patterns and processes across multiple social lineages. Here we examine the mechanisms of molecular convergence across two lineages of bees and wasps exhibiting such rudimentary societies. These societies consist of few individuals and their life histories range from facultative to obligately social. Using six species across four independent origins of sociality, we conduct a comparative meta-analysis of publicly available transcriptomes. Standard methods detected little similarity in patterns of differential gene expression in brain transcriptomes among reproductive and non-reproductive individuals across species. By contrast, both supervised machine learning and consensus co-expression network approaches uncovered sets of genes with conserved expression patterns among reproductive and non-reproductive phenotypes across species. These sets overlap substantially, and may comprise a shared genetic “toolkit” for sociality across the distantly related taxa of bees and wasps and independently evolved lineages of sociality. We also found many lineage-specific genes and co-expression modules associated with social phenotypes and possible signatures of shared life-history traits. These results reveal how taxon-specific molecular mechanisms complement a core toolkit of molecular processes in sculpting traits related to the evolution of eusociality.

     
    more » « less
  4. Abstract Motivation

    Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy.

    Results

    In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for joint inference of two GRNs and identification of the differential GRN under two conditions.

    Availability and implementation

    The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    Understanding the underlying structure of a gene regulatory network is crucial to understand the biological functions of genes or groups of genes. A common strategy to investigate it is to find community structure of these networks. However, methods of finding these communities are often sensitive to noise in the gene expression data and the inherent stochasticity of the community detection algorithms. Here we introduce an approach for identifying functional groups and their hierarchical organization in gene co-expression networks from expression data. A network describing the relatedness in the expression profiles of genes is first inferred using an information theoretic approach. Community structure within the inferred network is found by usingmodularity maximization. This community structure is further refined using three-body structural correlations to robustly identify important functional gene communities. We apply this approach to the expression data ofE. coligenes and identify 25 robust groups, many of which show key associations with important biological functions as demonstrated by gene ontology term enrichment analysis. Thus, our approach makes specific and novel predictions about the function of these genes.

     
    more » « less