Objective: Evaluate the effectiveness of machine learning tools that incorporate spatial information such as disease location and lymph node metastatic patterns-of-spread, for prediction of survival and toxicity in HPV+ oropharyngeal cancer (OPC). Materials & methods: 675 HPV+ OPC patients that were treated at MD Anderson Cancer Center between 2005 and 2013 with curative intent IMRT were retrospectively collected under IRB approval. Risk stratifications incorporating patient radiometric data and lymph node metastasis patterns via an anatomically-adjacent representation with hierarchical clustering were identified. These clusterings were combined into a 3-level patient stratification and included along with other known clinical features in a Cox model for predicting survival outcomes, and logistic regression for toxicity, using independent subsets for training and validation. Results: Four groups were identified and combined into a 3-level stratification. The inclusion of patient stratifications in predictive models for 5-yr Overall survival (OS), 5-year recurrence free survival, (RFS) and Radiation-associated dysphagia (RAD) consistently improved model performance measured using the area under the curve (AUC). Test set AUC improvements over models with clinical covariates, was 9 % for predicting OS, and 18 % for predicting RFS, and 7 % for predicting RAD. For models with both clinical and AJCC covariates, AUC improvement was 7 %, 9 %, and 2 % for OS, RFS, and RAD, respectively. Conclusion: Including data-driven patient stratifications considerably improve prognosis for survival and toxicity outcomes over the performance achieved by clinical staging and clinical covariates alone. These stratifications generalize well to across cohorts, and sufficient information for reproducing these clusters is included.
more »
« less
Longitudinal Patterns in Clinical and Imaging Measurements Predict Residual Survival in Glioblastoma Patients
The growing amount of longitudinal data for a large population of patients has necessitated the application of algorithms that can discover patterns that inform patient management. This study demonstrates how temporal patterns generated from a combination of clinical and imaging measurements improve residual survival prediction in glioblastoma patients. Temporal patterns were identified with sequential pattern mining using data from 304 patients. Along with patient covariates, the patterns were incorporated as features in logistic regression models to predict 2-, 6-, or 9-month residual survival at each visit. The modeling approach that included temporal patterns achieved test performances of 0.820, 0.785, and 0.783 area under the receiver operating characteristic curve for predicting 2-, 6-, and 9-month residual survival, respectively. This approach significantly outperformed models that used tumor volume alone (p < 0.001) or tumor volume combined with patient covariates (p < 0.001) in training. Temporal patterns involving an increase in tumor volume above 122 mm3/day, a decrease in KPS across multiple visits, moderate neurologic symptoms, and worsening overall neurologic function suggested lower residual survival. These patterns are readily interpretable and found to be consistent with known prognostic indicators, suggesting they can provide early indicators to clinicians of changes in patient state that inform management decisions.
more »
« less
- Award ID(s):
- 1722516
- PAR ID:
- 10064295
- Date Published:
- Journal Name:
- Scientific reports
- ISSN:
- 2045-2322
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Cancer is an umbrella term that includes a wide spectrum of disease severity, from those that are malignant, metastatic, and aggressive to benign lesions with very low potential for progression or death. The ability to prognosticate patient outcomes would facilitate management of various malignancies: patients whose cancer is likely to advance quickly would receive necessary treatment that is commensurate with the predicted biology of the disease. Former prognostic models based on clinical variables (age, gender, cancer stage, tumor grade, etc.), though helpful, cannot account for genetic differences, molecular etiology, tumor heterogeneity, and important host biological mechanisms. Therefore, recent prognostic models have shifted toward the integration of complementary information available in both molecular data and clinical variables to better predict patient outcomes: vital status (overall survival), metastasis (metastasis-free survival), and recurrence (progression-free survival). In this article, we review 20 survival prediction approaches that integrate multi-omics and clinical data to predict patient outcomes. We discuss their strategies for modeling survival time (continuous and discrete), the incorporation of molecular measurements and clinical variables into risk models (clinical and multi-omics data), how to cope with censored patient records, the effectiveness of data integration techniques, prediction methodologies, model validation, and assessment metrics. The goal is to inform life scientists of available resources, and to provide a complete review of important building blocks in survival prediction. At the same time, we thoroughly describe the pros and cons of each methodology, and discuss in depth the outstanding challenges that need to be addressed in future method development.more » « less
-
Abstract Sudden cardiac death from arrhythmia is a major cause of mortality worldwide. In this study, we developed a novel deep learning (DL) approach that blends neural networks and survival analysis to predict patient-specific survival curves from contrast-enhanced cardiac magnetic resonance images and clinical covariates for patients with ischemic heart disease. The DL-predicted survival curves offer accurate predictions at times up to 10 years and allow for estimation of uncertainty in predictions. The performance of this learning architecture was evaluated on multi-center internal validation data and tested on an independent test set, achieving concordance indexes of 0.83 and 0.74 and 10-year integrated Brier scores of 0.12 and 0.14. We demonstrate that our DL approach, with only raw cardiac images as input, outperforms standard survival models constructed using clinical covariates. This technology has the potential to transform clinical decision-making by offering accurate and generalizable predictions of patient-specific survival probabilities of arrhythmic death over time.more » « less
-
null (Ed.)Abstract Motivation While each cancer is the result of an isolated evolutionary process, there are repeated patterns in tumorigenesis defined by recurrent driver mutations and their temporal ordering. Such repeated evolutionary trajectories hold the potential to improve stratification of cancer patients into subtypes with distinct survival and therapy response profiles. However, current cancer phylogeny methods infer large solution spaces of plausible evolutionary histories from the same sequencing data, obfuscating repeated evolutionary patterns. Results To simultaneously resolve ambiguities in sequencing data and identify cancer subtypes, we propose to leverage common patterns of evolution found in patient cohorts. We first formulate the Multiple Choice Consensus Tree problem, which seeks to select a tumor tree for each patient and assign patients into clusters in such a way that maximizes consistency within each cluster of patient trees. We prove that this problem is NP-hard and develop a heuristic algorithm, Revealing Evolutionary Consensus Across Patients (RECAP), to solve this problem in practice. Finally, on simulated data, we show RECAP outperforms existing methods that do not account for patient subtypes. We then use RECAP to resolve ambiguities in patient trees and find repeated evolutionary trajectories in lung and breast cancer cohorts. Availability and implementation https://github.com/elkebir-group/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.more » « less
-
BackgroundMetastatic cancer remains one of the leading causes of cancer-related mortality worldwide. Yet, the prediction of survivability in this population remains limited by heterogeneous clinical presentations and high-dimensional molecular features. Advances in machine learning (ML) provide an opportunity to integrate diverse patient- and tumor-level factors into explainable predictive ML models. Leveraging large real-world datasets and modern ML techniques can enable improved risk stratification and precision oncology. ObjectiveThis study aimed to develop and interpret ML models for predicting overall survival in patients with metastatic cancer using the Memorial Sloan Kettering-Metastatic (MSK-MET) dataset and to identify key prognostic biomarkers through explainable artificial intelligence techniques. MethodsWe performed a retrospective analysis of the MSK-MET cohort, comprising 25,775 patients across 27 tumor types. After data cleaning and balancing, 20,338 patients were included. Overall survival was defined as deceased versus living at last follow-up. Five classifiers (extreme gradient boosting [XGBoost], logistic regression, random forest, decision tree, and naive Bayes) were trained using an 80/20 stratified split and optimized via grid search with 5-fold cross-validation. Model performance was assessed using accuracy, area under the curve (AUC), precision, recall, and F1-score. Model explainability was achieved using Shapley additive explanations (SHAP). Survival analyses included Kaplan-Meier estimates, Cox proportional hazards models, and an XGBoost-Cox model for time-to-event prediction. The positive predictive value and negative predictive value were calculated at the Youden index–optimal threshold. ResultsXGBoost achieved the highest performance (accuracy=0.74; AUC=0.82), outperforming other classifiers. In survival analyses, the XGBoost-Cox model with a concordance index (C-index) of 0.70 exceeded the traditional Cox model (C-index=0.66). SHAP analysis and Cox models consistently identified metastatic site count, tumor mutational burden, fraction of genome altered, and the presence of distant liver and bone metastases as among the strongest prognostic factors, a pattern that held at both the pan-cancer level and recurrently across cancer-specific models. At the cancer-specific level, performance varied; prostate cancer achieved the highest predictive accuracy (AUC=0.88), while pancreatic cancer was notably more challenging (AUC=0.68). Kaplan-Meier analyses demonstrated marked survival separation between patients with and without metastases (80-month survival: approximately 0.80 vs 0.30). At the Youden-optimal threshold, positive predictive value and negative predictive value were approximately 70% and 80%, respectively, supporting clinical use for risk stratification. ConclusionsExplainable ML models, particularly XGBoost combined with SHAP, can strongly predict survivability in metastatic cancers while highlighting clinically meaningful features. These findings support the use of ML-based tools for patient counseling, treatment planning, and integration into precision oncology workflows. Future work should include external validation on independent cohorts, integration with electronic health records via Fast Healthcare Interoperability Resources–based dashboards, and prospective clinician-in-the-loop evaluation to assess real-world use.more » « less
An official website of the United States government

