skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A comprehensive review of cancer survival prediction using multi-omics integration and clinical variables
Abstract Cancer is an umbrella term that includes a wide spectrum of disease severity, from those that are malignant, metastatic, and aggressive to benign lesions with very low potential for progression or death. The ability to prognosticate patient outcomes would facilitate management of various malignancies: patients whose cancer is likely to advance quickly would receive necessary treatment that is commensurate with the predicted biology of the disease. Former prognostic models based on clinical variables (age, gender, cancer stage, tumor grade, etc.), though helpful, cannot account for genetic differences, molecular etiology, tumor heterogeneity, and important host biological mechanisms. Therefore, recent prognostic models have shifted toward the integration of complementary information available in both molecular data and clinical variables to better predict patient outcomes: vital status (overall survival), metastasis (metastasis-free survival), and recurrence (progression-free survival). In this article, we review 20 survival prediction approaches that integrate multi-omics and clinical data to predict patient outcomes. We discuss their strategies for modeling survival time (continuous and discrete), the incorporation of molecular measurements and clinical variables into risk models (clinical and multi-omics data), how to cope with censored patient records, the effectiveness of data integration techniques, prediction methodologies, model validation, and assessment metrics. The goal is to inform life scientists of available resources, and to provide a complete review of important building blocks in survival prediction. At the same time, we thoroughly describe the pros and cons of each methodology, and discuss in depth the outstanding challenges that need to be addressed in future method development.  more » « less
Award ID(s):
2343019 2203236
PAR ID:
10582423
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Briefings in Bioinformatics
Volume:
26
Issue:
2
ISSN:
1467-5463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. There are insufficient accurate biomarkers and effective therapeutic targets in current cancer treatment. Multi-omics regulatory networks in patient bulk tumors and single cells can shed light on molecular disease mechanisms. Integration of multi-omics data with large-scale patient electronic medical records (EMRs) can lead to the discovery of biomarkers and therapeutic targets. In this review, multi-omics data harmonization methods were introduced, and common approaches to molecular network inference were summarized. Our Prediction Logic Boolean Implication Networks (PLBINs) have advantages over other methods in constructing genome-scale multi-omics networks in bulk tumors and single cells in terms of computational efficiency, scalability, and accuracy. Based on the constructed multi-modal regulatory networks, graph theory network centrality metrics can be used in the prioritization of candidates for discovering biomarkers and therapeutic targets. Our approach to integrating multi-omics profiles in a patient cohort with large-scale patient EMRs such as the SEER-Medicare cancer registry combined with extensive external validation can identify potential biomarkers applicable in large patient populations. These methodologies form a conceptually innovative framework to analyze various available information from research laboratories and healthcare systems, accelerating the discovery of biomarkers and therapeutic targets to ultimately improve cancer patient survival outcomes. 
    more » « less
  2. Objective: The rapid advancement of high-throughput technologies in the biomedical field has resulted in the accumulation of diverse omics data types, such as mRNA expression, DNA methylation, and microRNA expression, for studying various diseases. Integrating these multi-omics datasets enables a comprehensive understanding of the molecular basis of cancer and facilitates accurate prediction of disease progression. Methods: However, conventional approaches face challenges due to the dimensionality curse problem. This paper introduces a novel framework called Knowledge Distillation and Supervised Variational AutoEncoders utilizing View Correlation Discovery Network (KD-SVAE-VCDN) to address the integration of high-dimensional multi-omics data with limited common samples. Through our experimental evaluation, we demonstrate that the proposed KD-SVAE-VCDN architecture accurately predicts the progression of breast and kidney carcinoma by effectively classifying patients as long- or short-term survivors. Furthermore, our approach outperforms other state-of-the-art multi-omics integration models. Results: Our findings highlight the efficacy of the KD-SVAE-VCDN architecture in predicting the disease progression of breast and kidney carcinoma. By enabling the classification of patients based on survival outcomes, our model contributes to personalized and targeted treatments. The favorable performance of our approach in comparison to several existing models suggests its potential to contribute to the advancement of cancer understanding and management. Conclusion: The development of a robust predictive model capable of accurately forecasting disease progression at the time of diagnosis holds immense promise for advancing personalized medicine. By leveraging multi-omics data integration, our proposed KD-SVAE-VCDN framework offers an effective solution to this challenge, paving the way for more precise and tailored treatment strategies for patients with different types of cancer. 
    more » « less
  3. Abstract Non-small-cell lung cancer (NSCLC) represents approximately 80–85% of lung cancer diagnoses and is the leading cause of cancer-related death worldwide. Recent studies indicate that image-based radiomics features from positron emission tomography/computed tomography (PET/CT) images have predictive power for NSCLC outcomes. To this end, easily calculated functional features such as the maximum and the mean of standard uptake value (SUV) and total lesion glycolysis (TLG) are most commonly used for NSCLC prognostication, but their prognostic value remains controversial. Meanwhile, convolutional neural networks (CNN) are rapidly emerging as a new method for cancer image analysis, with significantly enhanced predictive power compared to hand-crafted radiomics features. Here we show that CNNs trained to perform the tumor segmentation task, with no other information than physician contours, identify a rich set of survival-related image features with remarkable prognostic value. In a retrospective study on pre-treatment PET-CT images of 96 NSCLC patients before stereotactic-body radiotherapy (SBRT), we found that the CNN segmentation algorithm (U-Net) trained for tumor segmentation in PET and CT images, contained features having strong correlation with 2- and 5-year overall and disease-specific survivals. The U-Net algorithm has not seen any other clinical information (e.g. survival, age, smoking history, etc.) than the images and the corresponding tumor contours provided by physicians. In addition, we observed the same trend by validating the U-Net features against an extramural data set provided by Stanford Cancer Institute. Furthermore, through visualization of the U-Net, we also found convincing evidence that the regions of metastasis and recurrence appear to match with the regions where the U-Net features identified patterns that predicted higher likelihoods of death. We anticipate our findings will be a starting point for more sophisticated non-intrusive patient specific cancer prognosis determination. For example, the deep learned PET/CT features can not only predict survival but also visualize high-risk regions within or adjacent to the primary tumor and hence potentially impact therapeutic outcomes by optimal selection of therapeutic strategy or first-line therapy adjustment. 
    more » « less
  4. Tumor-initiating cells with reprogramming plasticity or stem-progenitor cell properties (stemness) are thought to be essential for cancer development and metastatic regeneration in many cancers; however, elucidation of the underlying molecular network and pathways remains demanding. Combining machine learning and experimental investigation, here we report CD81, a tetraspanin transmembrane protein known to be enriched in extracellular vesicles (EVs), as a newly identified driver of breast cancer stemness and metastasis. Using protein structure modeling and interface prediction-guided mutagenesis, we demonstrate that membrane CD81 interacts with CD44 through their extracellular regions in promoting tumor cell cluster formation and lung metastasis of triple negative breast cancer (TNBC) in human and mouse models. In-depth global and phosphoproteomic analyses of tumor cells deficient with CD81 or CD44 unveils endocytosis-related pathway alterations, leading to further identification of a quality-keeping role of CD44 and CD81 in EV secretion as well as in EV-associated stemness-promoting function. CD81 is coexpressed along with CD44 in human circulating tumor cells (CTCs) and enriched in clustered CTCs that promote cancer stemness and metastasis, supporting the clinical significance of CD81 in association with patient outcomes. Our study highlights machine learning as a powerful tool in facilitating the molecular understanding of new molecular targets in regulating stemness and metastasis of TNBC. 
    more » « less
  5. Objective: Evaluate the effectiveness of machine learning tools that incorporate spatial information such as disease location and lymph node metastatic patterns-of-spread, for prediction of survival and toxicity in HPV+ oropharyngeal cancer (OPC). Materials & methods: 675 HPV+ OPC patients that were treated at MD Anderson Cancer Center between 2005 and 2013 with curative intent IMRT were retrospectively collected under IRB approval. Risk stratifications incorporating patient radiometric data and lymph node metastasis patterns via an anatomically-adjacent representation with hierarchical clustering were identified. These clusterings were combined into a 3-level patient stratification and included along with other known clinical features in a Cox model for predicting survival outcomes, and logistic regression for toxicity, using independent subsets for training and validation. Results: Four groups were identified and combined into a 3-level stratification. The inclusion of patient stratifications in predictive models for 5-yr Overall survival (OS), 5-year recurrence free survival, (RFS) and Radiation-associated dysphagia (RAD) consistently improved model performance measured using the area under the curve (AUC). Test set AUC improvements over models with clinical covariates, was 9 % for predicting OS, and 18 % for predicting RFS, and 7 % for predicting RAD. For models with both clinical and AJCC covariates, AUC improvement was 7 %, 9 %, and 2 % for OS, RFS, and RAD, respectively. Conclusion: Including data-driven patient stratifications considerably improve prognosis for survival and toxicity outcomes over the performance achieved by clinical staging and clinical covariates alone. These stratifications generalize well to across cohorts, and sufficient information for reproducing these clusters is included. 
    more » « less