skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 1, 2026

Title: Colorectal Cancer Biomarker Identification via Joint DNA-Methylation and Transcriptomics Analysis Workflow
Background: Colorectal cancer (CRC) is a term that refers to the combination of colon and rectal cancer as they are being treated as a single tumor. In CRC, 72% of tumors are colon cancer, while the other 28% represent rectal cancer. CRC is a multifactorial disease caused by both genetic and epigenetic changes in the colon mucosal cells, affecting the oncogenes, DNA repair genes, and tumor suppressor genes. Currently, two DNA methylation-based biomarkers for CRC have received FDA approval: SEPT9, used in blood-based screening tests, and a combination of NDRG4 and BMP3 for stool-based tests. Although DNA methylation biomarkers have been explored in colorectal cancer (CRC), the identification of robust and clinically valuable biomarkers remains a challenge, particularly for early-stage detection and precancerous lesions. Patients often receive diagnoses at the locally advanced stage, which limits the potential utility of current biomarkers in clinical settings. Methods: The datasets used in this study were retrieved from the GEO database, specifically GSE75548 and GSE75546 for rectal cancer and GSE50760 and GSE101764 for colon cancer, summing up to a total of 130 paired samples. These datasets represent expression profiling by array, methylation profiling by genome tiling array, and expression profiling by high-throughput sequencing and include rectal and colon cancer samples paired with adjacent normal tissue samples. Differential analysis was used to identify differentially methylated CPG sites (DMCs) and identify differentially expressed genes (DEGs). Results: From the integration of DMCs with DEGs in colorectal cancer, we identified 150 candidates for methylation-regulated genes (MRGs) with two genes common across all cohorts (GNG7 and PDX1) highlighted as candidate biomarkers in CRC. The functional enrichment analysis and protein–protein interactions (PPIs) identified relevant pathways involved in CRC, including the Wnt signaling pathway, extracellular matrix (ECM) organization, among other enriched pathways. Conclusions: Our findings show the strength of our in silco computational approach in jointly identifying methylation-regulated biomarkers for colon cancer and highlight several genes and pathways as biomarker candidates for further investigations.  more » « less
Award ID(s):
2341725
PAR ID:
10638963
Author(s) / Creator(s):
;
Publisher / Repository:
Genes
Date Published:
Journal Name:
Genes
Volume:
16
Issue:
6
ISSN:
2073-4425
Page Range / eLocation ID:
620
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Colonoscopy is accurate but inefficient for colorectal cancer (CRC) prevention due to the low (~ 7 to 8%) prevalence of target lesions, advanced adenomas. We leveraged rectal mucosa to identify patients who harbor CRC field carcinogenesis by evaluating chromatin 3D architecture. Supranucleosomal disordered chromatin chains (~ 5 to 20 nm, ~1 kbp) fold into chromatin packing domains (~ 100 to 200 nm, ~ 100 to 1000 kbp). In turn, the fractal-like conformation of DNA within chromatin domains and the folding of the genome into packing domains has been shown to influence multiple facets of gene transcription, including the transcriptional plasticity of cancer cells. We deployed an optical spectroscopic nanosensing technique, chromatin-sensitive partial wave spectroscopic microscopy (csPWS), to evaluate the packing density scaling D of the chromatin chain conformation within packing domains from rectal mucosa in 256 patients with varying degrees of progression to colorectal cancer. We found average packing scaling D of chromatin domains was elevated in tumor cells, histologically normal-appearing cells 4 cm proximal to the tumor, and histologically normal-appearing rectal mucosa compared to cells from control patients (p < 0.001). Nuclear D had a robust correlation with the model of 5-year risk of CRC with r2 = 0.94. Furthermore, rectal D was evaluated as a screening biomarker for patients with advanced adenomas presenting an AUC of 0.85 and 85% sensitivity and specificity. artificial intelligence-enhanced csPWS improved diagnostic performance with AUC = 0.90. Considering the low sensitivity of existing CRC tests, including liquid biopsies, to early-stage cancers our work highlights the potential of chromatin biomarkers of field carcinogenesis in detecting early, significant precancerous colon lesions. 
    more » « less
  2. Lung cancer is the second most common cancer in the world. The aim of this study is to identify biomarkers for lung cancer that can aid in its diagnosis and treatment. The gene expression profiles from GEO database were analyzed by GEO2R to identify Differentially Expressed Genes (DEGs) and further analyzed using Cytoscape. The data was divided into two categories: non-treatment and treatment groups. A total of 407 DEGs (254 upregulated and 153 downregulated) and 259 DEGs (124 upregulated and 135 downregulated) were isolated for non-treatment and treatment studies respectively. The significant Gene Ontologies and pathways enriched with DEGS were identified using Cytoscape apps, BiNGO and ReactomeFIPlugIn, respectively. Hub genes based on network parameters - Degree, Closeness and Betweenness - were isolated using CytoHubba. In conclusion, DEGs identified in this study may play an important role in early diagnosis or as biomarkers of lung cancer. 
    more » « less
  3. Hepatocellular carcinoma (HCC) is one of the most fatal cancers in the world. There is an urgent need to understand the molecular background of HCC to facilitate the identification of biomarkers and discover effective therapeutic targets. Published transcriptomic studies have reported a large number of genes that are individually significant for HCC. However, reliable biomarkers remain to be determined. In this study, built on max-linear competing risk factor models, we developed a machine learning analytical framework to analyze transcriptomic data to identify the most miniature set of differentially expressed genes (DEGs). By analyzing 9 public whole-transcriptome datasets (containing 1184 HCC samples and 672 nontumor controls), we identified 5 critical differentially expressed genes (DEGs) (ie, CCDC107, CXCL12, GIGYF1, GMNN, and IFFO1) between HCC and control samples. The classifiers built on these 5 DEGs reached nearly perfect performance in identification of HCC. The performance of the 5 DEGs was further validated in a US Caucasian cohort that we collected (containing 17 HCC with paired nontumor tissue). The conceptual advance of our work lies in modeling gene-gene interactions and correcting batch effect in the analytic framework. The classifiers built on the 5 DEGs demonstrated clear signature patterns for HCC. The results are interpretable, robust, and reproducible across diverse cohorts/populations with various disease etiologies, indicating the 5 DEGs are intrinsic variables that can describe the overall features of HCC at the genomic level. The analytical framework applied in this study may pave a new way for improving transcriptome profiling analysis of human cancers. 
    more » « less
  4. Colorectal cancer (CRC) is the third-most leading cause of cancer-related deaths in the United States. To advance the understanding of CRC tumor progression, models which mimic the tumor microenvironment (TME) and have translatable study outcomes are urgently needed. CRC patient-derived xenografts (PDXs) are promising tools for their ability to recapitulate tumor heterogeneity and key patient tumor characteristics, such as molecular characteristics. However, as in vivo models, CRC PDXs are costly and low-throughput, which leads to a need for equivalent in vitro models. To address this need, we previously established an in vitro model using a tissue engineering toolset with CRC PDX cells. However, it is unclear whether tissue engineering has the capacity to maintain patient- and/or cancer stage-specific tumor heterogeneity. To address this gap, we employed three PDX tumor lines, originated from stage II, III-B, and IV CRC tumors, in the formation of 3D engineered CRC PDX (3D-eCRC-PDX) tissues and performed an in-depth comparison between the 3D-eCRC-PDX tissues and the original CRC-PDX tumors. To form the tissues, CRC-PDX tumors were expanded in vivo and dissociated. The isolated cells were encapsulated within poly(ethylene glycol)-fibrinogen hydrogels and remained viable and proliferative post encapsulation over the course of 29 days in culture. To gain molecular insight into the maintenance of PDX line stage heterogeneity, we performed a transcriptomic analysis using RNA seq to determine the extent to which there were similarities and differences between the CRC-PDX tumors and the 3D-eCRC-PDX tissues. We observed the greatest correspondence in overlapping differentially expressed human genes, gene ontology, and Hallmark gene set enrichment between the 3D-eCRC-PDX tissues and CRC-PDX tumors in the stage II PDX line, while the least correspondence was observed in the stage IV PDX line. The Hallmark gene set enrichment from murine mapped RNA seq transcripts was PDX line-specific which suggested that the stromal component of the 3D-eCRC-PDX tissues was maintained in a PDX line-dependent manner. Consistent with our transcriptomic analysis, we observed that tumor cell subpopulations, including human proliferative (B2M+Ki67+) and CK20+ cells, remained constant for up to 15 days in culture even though the number of cells in the 3D-eCRC-PDX tissues from all three CRC stages increased over time. Yet, tumor cell subpopulation differences in the stage IV 3D-eCRC-PDX tissues were observed starting at 22 days in culture. Overall, our results demonstrate a strong correlation between our in vitro 3D-eCRC-PDX models and the originating in vivo CRC-PDX tumors, providing evidence that these engineered tissues may be capable of mimicking patient- and/or cancer stage-specific heterogeneity. 
    more » « less
  5. Abstract Lung adenocarcinoma (LUAD) remains a leading cause of cancer-related mortalities, characterized by substantial genetic heterogeneity that challenges a comprehensive understanding of its progression. This study employs next-generation sequencing data analysis to transform our comprehension of LUAD pathogenesis. Integrating epigenetic and transcriptomic data of LUAD patients, this approach assessed the critical regulatory occurrences, identified therapeutic targets, and offered profound insights into cancer molecular foundations. We employed the DNA methylation data to identify differentially methylated CpG sites and explored the transcriptome profiles of their adjacent genes. An intersectional analysis of gene expression profiles uncovered 419 differentially expressed genes (DEGs) influenced by smoke-induced differential DNA methylation, among which hub genes, including mitochondrial ribosomal proteins (MRPs), and ribosomal proteins (RPs) such asMRPS15,MRPS5,MRPL33,RPL24,RPL7L1,MRPL15,TUFM,MRPL22, andRSL1D1, were identified using a network-based approach. These hub genes were overexpressed and enriched to RNA processing, ribosome biogenesis, and mitochondrial translation, which is critical in LUAD progression. Enhancer Linking Methylation/Expression Relationship (ELMER) analysis revealed transcription factor (TF) binding motifs, such asJUN,NKX23,FOSB,RUNX3, andFOSL1, which regulated these hub genes through methylation-dependent enhancer dynamics. Predominant hypomethylation of MRPs and RPs disrupted mitochondrial function, contributed to oxidative phosphorylation (OXPHOS) and metabolic reprogramming, favoring cancer cell survival. The survival analysis validated the clinical relevance of these hub genes, with high-expression cohorts exhibiting poor overall survival (OS) outcomes enlightened their relevance in LUAD pathogenesis and presented the potential for developing novel targeted therapeutic strategies. 
    more » « less