skip to main content

Title: Cancer Biomarker Discovery from Gene Co-expression Networks Using Community Detection Methods
Finding the network biomarkers of cancers and the analysis of cancer driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly known as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. In this study, RNA-seq gene expression data of three cancers - Breast Invasive Carcinoma (BRCA), Colorectal Adenocarcinoma (COAD) and Glioblastoma Multiforme (GBM) - from The Cancer Genome Atlas (TCGA) are used to construct gene co-expression networks using Pearson Correlation. Six well-known community detection algorithms are applied on these networks to identify communities with five or more genes. A permutation test is performed to further mine the communities that are conserved in other cancers, thus calling them conserved communities. Then survival analysis is performed on clinical data of three cancers using the conserved community genes as prognostic co-variates. The communities that could distinguish the cancer patients between high- and low-risk groups are considered as cancer biomarkers. In the present study, 16 such network biomarkers are discovered.
Award ID(s):
1901628 1651917
Publication Date:
Journal Name:
2019 IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM)
Page Range or eLocation-ID:
2097 to 2104
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The ability to predict the efficacy of cancer treatments is a longstanding goal of precision medicine that requires improved understanding of molecular interactions with drugs and the discovery of biomarkers of drug response. Identifying genes whose expression influences drug sensitivity can help address both of these needs, elucidating the molecular pathways involved in drug efficacy and providing potential ways to predict new patients’ response to available therapies. In this study, we integrated cancer type, drug treatment, and survival data with RNA-seq gene expression data from The Cancer Genome Atlas to identify genes and gene sets whose expression levels inmore »patient tumor biopsies are associated with drug-specific patient survival using a log-rank test comparing survival of patients with low vs. high expression for each gene. This analysis was successful in identifying thousands of such gene–drug relationships across 20 drugs in 14 cancers, several of which have been previously implicated in the respective drug’s efficacy. We then clustered significant genes based on their expression patterns across patients and defined gene sets that are more robust predictors of patient outcome, many of which were significantly enriched for target genes of one or more transcription factors, indicating several upstream regulatory mechanisms that may be involved in drug efficacy. We identified a large number of genes and gene sets that were potentially useful as transcript-level biomarkers for predicting drug-specific patient survival outcome. Our gene sets were robust predictors of drug-specific survival and our results included both novel and previously reported findings, suggesting that the drug-specific survival marker genes reported herein warrant further investigation for insights into drug mechanisms and for validation as biomarkers to aid cancer therapy decisions.« less
  2. Ribonuclease (RNase) H2 is a key enzyme for the removal of RNA found in DNA-RNA hybrids, playing a fundamental role in biological processes such as DNA replication, telomere maintenance, and DNA damage repair. RNase H2 is a trimer composed of three subunits, RNASEH2A being the catalytic subunit. RNASEH2A expression levels have been shown to be upregulated in transformed and cancer cells. In this study, we used a bioinformatics approach to identify RNASEH2A co-expressed genes in different human tissues to underscore biological processes associated with RNASEH2A expression. Our analysis shows functional networks for RNASEH2A involvement such as DNA replication and DNAmore »damage response and a novel putative functional network of cell cycle regulation. Further bioinformatics investigation showed increased gene expression in different types of actively cycling cells and tissues, particularly in several cancers, supporting a biological role for RNASEH2A but not for the other two subunits of RNase H2 in cell proliferation. Mass spectrometry analysis of RNASEH2A-bound proteins identified players functioning in cell cycle regulation. Additional bioinformatic analysis showed that RNASEH2A correlates with cancer progression and cell cycle related genes in Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA) Pan Cancer datasets and supported our mass spectrometry findings.« less
  3. Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectalmore »adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer.« less
  4. Claesen, Jan (Ed.)
    ABSTRACT Colorectal cancer (CRC) is the second leading cause of cancer mortality worldwide. The dysbiotic gut microbiota and its metabolite secretions play a significant role in CRC development and progression. In this study, we identified microbial and metabolic biomarkers applicable to CRC using a meta-analysis of metagenomic datasets from diverse geographical regions. We used LEfSe, random forest (RF), and co-occurrence network methods to identify microbial biomarkers. Geographic dataset-specific markers were identified and evaluated using area under the ROC curve (AUC) scores and random effect size. Co-occurrence networks analysis showed a reduction in the overall microbial associations and the presence ofmore »oral pathogenic microbial clusters in CRC networks. Analysis of predicted metabolites from CRC datasets showed the enrichment of amino acids, cadaverine, and creatine in CRC, which were positively correlated with CRC-associated microbes ( Peptostreptococcus stomatis , Gemella morbillorum , Bacteroides fragilis , Parvimonas spp., Fusobacterium nucleatum , Solobacterium moorei , and Clostridium symbiosum ), and negatively correlated with control-associated microbes. Conversely, butyrate, nicotinamide, choline, tryptophan, and 2-hydroxybutanoic acid showed positive correlations with control-associated microbes ( P < 0.05). Overall, our study identified a set of global CRC biomarkers that are reproducible across geographic regions. We also reported significant differential metabolites and microbe-metabolite interactions associated with CRC. This study provided significant insights for further investigations leading to the development of noninvasive CRC diagnostic tools and therapeutic interventions. IMPORTANCE Several studies showed associations between gut dysbiosis and CRC. Yet, the results are not conclusive due to cohort-specific associations that are influenced by genomic, dietary, and environmental stimuli and associated reproducibility issues with various analysis approaches. Emerging evidence suggests the role of microbial metabolites in modulating host inflammation and DNA damage in CRC. However, the experimental validations have been hindered by cost, resources, and cumbersome technical expertise required for metabolomic investigations. In this study, we performed a meta-analysis of CRC microbiota data from diverse geographical regions using multiple methods to achieve reproducible results. We used a computational approach to predict the metabolomic profiles using existing CRC metagenomic datasets. We identified a reliable set of CRC-specific biomarkers from this analysis, including microbial and metabolite markers. In addition, we revealed significant microbe-metabolite associations through correlation analysis and microbial gene families associated with dysregulated metabolic pathways in CRC, which are essential in understanding the vastly sporadic nature of CRC development and progression.« less
  5. Abstract Background There is growing evidence indicating that a number of functional connectivity networks are disrupted at each stage of the full clinical Alzheimer’s disease spectrum. Such differences are also detectable in cognitive normal (CN) carrying mutations of AD risk genes, suggesting a substantial relationship between genetics and AD-altered functional brain networks. However, direct genetic effect on functional connectivity networks has not been measured. Methods Leveraging existing AD functional connectivity studies collected in NeuroSynth, we performed a meta-analysis to identify two sets of brain regions: ones with altered functional connectivity in resting state network and ones without. Then with themore »brain-wide gene expression data in the Allen Human Brain Atlas, we applied a new biclustering method to identify a set of genes with differential co-expression patterns between these two set of brain regions. Results Differential co-expression analysis using biclustering method led to a subset of 38 genes which showed distinctive co-expression patterns between AD-related and non AD-related brain regions in default mode network. More specifically, we observed 4 sub-clusters with noticeable co-expression difference, where the difference in correlations is above 0.5 on average. Conclusions This work applies a new biclustering method to search for a subset of genes with altered co-expression patterns in AD-related default mode network regions. Compared with traditional differential expression analysis, differential co-expression analysis yielded many more significant hits with extra insights into the wiring mechanism between genes. Particularly, the differential co-expression pattern was observed between two sets of genes, suggesting potential upstream genetic regulators in AD development.« less