skip to main content

Title: Detecting survival-associated biomarkers from heterogeneous populations
Abstract Detection of prognostic factors associated with patients’ survival outcome helps gain insights into a disease and guide treatment decisions. The rapid advancement of high-throughput technologies has yielded plentiful genomic biomarkers as candidate prognostic factors, but most are of limited use in clinical application. As the price of the technology drops over time, many genomic studies are conducted to explore a common scientific question in different cohorts to identify more reproducible and credible biomarkers. However, new challenges arise from heterogeneity in study populations and designs when jointly analyzing the multiple studies. For example, patients from different cohorts show different demographic characteristics and risk profiles. Existing high-dimensional variable selection methods for survival analysis, however, are restricted to single study analysis. We propose a novel Cox model based two-stage variable selection method called “Cox-TOTEM” to detect survival-associated biomarkers common in multiple genomic studies. Simulations showed our method greatly improved the sensitivity of variable selection as compared to the separate applications of existing methods to each study, especially when the signals are weak or when the studies are heterogeneous. An application of our method to TCGA transcriptomic data identified essential survival associated genes related to the common disease mechanism of five Pan-Gynecologic cancers.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Scientific Reports
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The prognosis of hepatocellular carcinoma (HCC) after R0 resection is unsatisfactory due to the high rate of recurrence. In this study, we investigated the recurrence‐related RNAs and the underlying mechanism. The long noncoding RNA (lncRNA), microRNA (miRNA), and messenger RNA (mRNA) expression data and clinical information of 247 patients who underwent R0 resection patients with HCC were obtained from The Cancer Genome Atlas. Comparing the 1‐year recurrence group (n = 56) with the nonrecurrence group (n = 60), we detected 34 differentially expressed lncRNAs (DElncRNAs), five DEmiRNAs, and 216 DEmRNAs. Of these, three DElncRNAs, hsa‐mir‐150‐5p, and 11 DEmRNAs were selected for constructing the competing endogenous RNA (ceRNA) network. Next, two nomogram models were constructed based separately on the lncRNAs and mRNAs that were further selected by Cox and least absolute shrinkage and selection operator regression analysis. The two nomogram models that showed a high prediction accuracy for disease‐free survival with the concordance indexes at 0.725 and 0.639. Further functional enrichment analysis of DEmRNAs showed that the mRNAs in the ceRNA network and nomogram models were associated with immune pathways. Hence, we constructed a hsa‐mir‐150‐5p‐centric ceRNA network and two effective nomogram prognostic models, and the related RNAs may be useful as potential biomarkers for predicting recurrence in patients with HCC.

    more » « less
  2. Abstract

    Although the overall five-year survival of patients with pancreatic ductal adenocarcinoma (PDAC) is dismal, there are survival differences between cases with clinically and pathologically indistinguishable characteristics, suggesting that there are uncharacterized properties that drive tumor progression. Recent mRNA sequencing studies reported gene-expression signatures that define PDAC molecular subtypes that correlate with differences in survival. We previously identified Keratin 17 (K17) as a negative prognostic biomarker in other cancer types. Here, we set out to determine if K17 is as accurate as molecular subtyping of PDAC to identify patients with the shortest survival. K17 mRNA was analyzed in two independent PDAC cohorts for discovery (n = 124) and validation (n = 145). Immunohistochemical localization and scoring of K17 immunohistochemistry (IHC) was performed in a third independent cohort (n = 74). Kaplan-Meier and Cox proportional-hazard regression models were analyzed to determine cancer specific survival differences in low vs. high mRNA K17 expressing cases. We established that K17 expression in PDACs defines the most aggressive form of the disease. By using Cox proportional hazard ratio, we found that increased expression of K17 at the IHC level is also associated with decreased survival of PDAC patients. Additionally, within PDACs of advanced stage and negative surgical margins, K17 at both mRNA and IHC level is sufficient to identify the subgroup with the shortest survival. These results identify K17 as a novel negative prognostic biomarker that could inform patient management decisions.

    more » « less
  3. Background

    Although pancreatic ductal adenocarcinoma (PDAC) has one of the lowest 5‐year survival rates of all cancers, differences in survival exist between patients with clinically identical characteristics. The authors previously demonstrated that keratin 17 (K17) expression in PDAC, measured by RNA sequencing or immunohistochemistry (IHC), is an independent negative prognostic biomarker. Only 20% of cases are candidates for surgical resection, but most patients are diagnosed by needle aspiration biopsy (NAB). The aims of this study were to determine whether there was a correlation in K17 scores detected in matched NABs and surgical resection tissue sections and whether K17 IHC in NAB cell block specimens could be used as a negative prognostic biomarker in PDAC.


    K17 IHC was performed for a cohort of 70 patients who had matched NAB cell block and surgical resection samples to analyze the correlation of K17 expression levels. K17 IHC was also performed in cell blocks from discovery and validation cohorts. Kaplan‐Meier and Cox proportional hazards regression models were analyzed to determine survival differences in cases with different levels of K17 IHC expression.


    K17 IHC expression correlated in matched NABs and resection tissues. NAB samples were classified as high for K17 when ≥80% of tumor cells showed strong (2+) staining. High‐K17 cases, including stage‐matched cases, had shorter survival.


    K17 has been identified as a robust and independent prognostic biomarker that stratifies clinical outcomes for cases that are diagnosed by NAB. Testing for K17 also has the potential to inform clinical decisions for optimization of chemotherapeutic interventions.

    more » « less
  4. BACKGROUND: Lung transplantation is the gold standard for a carefully selected patient population with end-stage lung disease. We sought to create a unique risk stratification model using only preoperative recipient data to predict one-year postoperative mortality during our pre-transplant assessment. METHODS: Data of lung transplant recipients at Houston Methodist Hospital (HMH) from 1/2009 to 12/2014 were extracted from the United Network for Organ Sharing (UNOS) database. Patients were randomly divided into development and validation cohorts. Cox proportional-hazards models were conducted. Variables associated with 1-year mortality post-transplant were assigned weights based on the beta coefficients, and risk scores were derived. Patients were stratified into low-, medium- and high-risk categories. Our model was validated using the validation dataset and data from other US transplant centers in the UNOS database RESULTS: We randomized 633 lung recipients from HMH into the development (n=317 patients) and validation cohort (n=316). One-year survival after transplant was significantly different among risk groups: 95% (low-risk), 84% (medium-risk), and 72% (high-risk) (p<0.001) with a C-statistic of 0.74. Patient survival in the validation cohort was also significantly different among risk groups (85%, 77% and 65%, respectively, p<0.001). Validation of the model with the UNOS dataset included 9,920 patients and found 1-year survival to be 91%, 86% and 82%, respectively (p < 0.001). CONCLUSIONS: Using only recipient data collected at the time of pre-listing evaluation, our simple scoring system has good discrimination power and can be a practical tool in the assessment and selection of potential lung transplant recipients. 
    more » « less
  5. Advanced genomic and molecular profiling technologies accelerated the enlightenment of the regulatory mechanisms behind cancer development and progression, and the targeted therapies in patients. Along this line, intense studies with immense amounts of biological information have boosted the discovery of molecular biomarkers. Cancer is one of the leading causes of death around the world in recent years. Elucidation of genomic and epigenetic factors in Breast Cancer (BRCA) can provide a roadmap to uncover the disease mechanisms. Accordingly, unraveling the possible systematic connections between-omics data types and their contribution to BRCA tumor progression is crucial. In this study, we have developed a novel machine learning (ML) based integrative approach for multi-omics data analysis. This integrative approach combines information from gene expression (mRNA), microRNA (miRNA) and methylation data. Due to the complexity of cancer, this integrated data is expected to improve the prediction, diagnosis and treatment of disease through patterns only available from the 3-way interactions between these 3-omics datasets. In addition, the proposed method bridges the interpretation gap between the disease mechanisms that drive onset and progression. Our fundamental contribution is the 3 Multi-omics integrative tool (3Mint). This tool aims to perform grouping and scoring of groups using biological knowledge. Another major goal is improved gene selection via detection of novel groups of cross-omics biomarkers. Performance of 3Mint is assessed using different metrics. Our computational performance evaluations showed that the 3Mint classifies the BRCA molecular subtypes with lower number of genes when compared to the miRcorrNet tool which uses miRNA and mRNA gene expression profiles in terms of similar performance metrics (95% Accuracy). The incorporation of methylation data in 3Mint yields a much more focused analysis. The 3Mint tool and all other supplementary files are available at . 
    more » « less