PURPOSE: Identify Oropharyngeal cancer (OPC) patients at high-risk of developing long-term severe radiation-associated symptoms using dose volume histograms for organs-at-risk, via unsupervised clustering. MATERIAL AND METHODS: All patients were treated using radiation therapy for OPC. Dose-volume histograms of organs-at-risk were extracted from patients’ treatment plans. Symptom ratings were collected via the MD Anderson Symptom Inventory (MDASI) given weekly during, and 6 months post-treatment. Drymouth, trouble swallowing, mucus, and vocal dysfunction were selected for analysis in this study. Patient stratifications were obtained by applying Bayesian Mixture Models with three components to patient’s dose histograms for relevant organs. The clusters with the highest total mean doses were translated into dose thresholds using rule mining. Patient stratifications were compared against Tumor staging information using multivariate likelihood ratio tests. Model performance for prediction of moderate/severe symptoms at 6 months was compared against normal tissue complication probability (NTCP) models using cross-validation. RESULTS: A total of 349 patients were included for long-term symptom prediction. High-risk clusters were significantly correlated with outcomes for severe late drymouth (p <.0001, OR = 2.94), swallow (p = .002, OR = 5.13), mucus (p = .001, OR = 3.18), and voice (p = .009, OR = 8.99). Simplified clusters were also correlated with late severe symptoms for drymouth (p <.001, OR = 2.77), swallow (p = .01, OR = 3.63), mucus (p = .01, OR = 2.37), and voice (p <.001, OR = 19.75). Proposed cluster stratifications show better performance than NTCP models for severe drymouth (AUC.598 vs.559, MCC.143 vs.062), swallow (AUC.631 vs.561, MCC.20 vs -.030), mucus (AUC.596 vs.492, MCC.164 vs -.041), and voice (AUC.681 vs.555, MCC.181 vs -.019). Simplified dose thresholds also show better performance than baseline models for predicting late severe ratings for all symptoms. CONCLUSION: Our results show that leveraging the 3-D dose histograms from radiation therapy plan improves stratification of patients according to their risk of experiencing long-term severe radiation associated symptoms, beyond existing NTPC models. Our rule-based method can approximate our stratifications with minimal loss of accuracy and can proactively identify risk factors for radiation-associated toxicity.
more »
« less
Spatially-aware clustering improves AJCC-8 risk stratification performance in oropharyngeal carcinomas
Objective: Evaluate the effectiveness of machine learning tools that incorporate spatial information such as disease location and lymph node metastatic patterns-of-spread, for prediction of survival and toxicity in HPV+ oropharyngeal cancer (OPC). Materials & methods: 675 HPV+ OPC patients that were treated at MD Anderson Cancer Center between 2005 and 2013 with curative intent IMRT were retrospectively collected under IRB approval. Risk stratifications incorporating patient radiometric data and lymph node metastasis patterns via an anatomically-adjacent representation with hierarchical clustering were identified. These clusterings were combined into a 3-level patient stratification and included along with other known clinical features in a Cox model for predicting survival outcomes, and logistic regression for toxicity, using independent subsets for training and validation. Results: Four groups were identified and combined into a 3-level stratification. The inclusion of patient stratifications in predictive models for 5-yr Overall survival (OS), 5-year recurrence free survival, (RFS) and Radiation-associated dysphagia (RAD) consistently improved model performance measured using the area under the curve (AUC). Test set AUC improvements over models with clinical covariates, was 9 % for predicting OS, and 18 % for predicting RFS, and 7 % for predicting RAD. For models with both clinical and AJCC covariates, AUC improvement was 7 %, 9 %, and 2 % for OS, RFS, and RAD, respectively. Conclusion: Including data-driven patient stratifications considerably improve prognosis for survival and toxicity outcomes over the performance achieved by clinical staging and clinical covariates alone. These stratifications generalize well to across cohorts, and sufficient information for reproducing these clusters is included.
more »
« less
- Award ID(s):
- 2320261
- PAR ID:
- 10536541
- Publisher / Repository:
- Elsevier ScienceDirect: Oral Oncology
- Date Published:
- Journal Name:
- Oral Oncology
- Volume:
- 144
- Issue:
- C
- ISSN:
- 1368-8375
- Page Range / eLocation ID:
- 106460
- Subject(s) / Keyword(s):
- Head & Neck cancer Statistical and research methods Machine learning Radiation oncology Health Informatics Imaging Risk stratification Oropharyngeal cancer Clustering Radiomics
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
BackgroundMetastatic cancer remains one of the leading causes of cancer-related mortality worldwide. Yet, the prediction of survivability in this population remains limited by heterogeneous clinical presentations and high-dimensional molecular features. Advances in machine learning (ML) provide an opportunity to integrate diverse patient- and tumor-level factors into explainable predictive ML models. Leveraging large real-world datasets and modern ML techniques can enable improved risk stratification and precision oncology. ObjectiveThis study aimed to develop and interpret ML models for predicting overall survival in patients with metastatic cancer using the Memorial Sloan Kettering-Metastatic (MSK-MET) dataset and to identify key prognostic biomarkers through explainable artificial intelligence techniques. MethodsWe performed a retrospective analysis of the MSK-MET cohort, comprising 25,775 patients across 27 tumor types. After data cleaning and balancing, 20,338 patients were included. Overall survival was defined as deceased versus living at last follow-up. Five classifiers (extreme gradient boosting [XGBoost], logistic regression, random forest, decision tree, and naive Bayes) were trained using an 80/20 stratified split and optimized via grid search with 5-fold cross-validation. Model performance was assessed using accuracy, area under the curve (AUC), precision, recall, and F1-score. Model explainability was achieved using Shapley additive explanations (SHAP). Survival analyses included Kaplan-Meier estimates, Cox proportional hazards models, and an XGBoost-Cox model for time-to-event prediction. The positive predictive value and negative predictive value were calculated at the Youden index–optimal threshold. ResultsXGBoost achieved the highest performance (accuracy=0.74; AUC=0.82), outperforming other classifiers. In survival analyses, the XGBoost-Cox model with a concordance index (C-index) of 0.70 exceeded the traditional Cox model (C-index=0.66). SHAP analysis and Cox models consistently identified metastatic site count, tumor mutational burden, fraction of genome altered, and the presence of distant liver and bone metastases as among the strongest prognostic factors, a pattern that held at both the pan-cancer level and recurrently across cancer-specific models. At the cancer-specific level, performance varied; prostate cancer achieved the highest predictive accuracy (AUC=0.88), while pancreatic cancer was notably more challenging (AUC=0.68). Kaplan-Meier analyses demonstrated marked survival separation between patients with and without metastases (80-month survival: approximately 0.80 vs 0.30). At the Youden-optimal threshold, positive predictive value and negative predictive value were approximately 70% and 80%, respectively, supporting clinical use for risk stratification. ConclusionsExplainable ML models, particularly XGBoost combined with SHAP, can strongly predict survivability in metastatic cancers while highlighting clinically meaningful features. These findings support the use of ML-based tools for patient counseling, treatment planning, and integration into precision oncology workflows. Future work should include external validation on independent cohorts, integration with electronic health records via Fast Healthcare Interoperability Resources–based dashboards, and prospective clinician-in-the-loop evaluation to assess real-world use.more » « less
-
The growing amount of longitudinal data for a large population of patients has necessitated the application of algorithms that can discover patterns that inform patient management. This study demonstrates how temporal patterns generated from a combination of clinical and imaging measurements improve residual survival prediction in glioblastoma patients. Temporal patterns were identified with sequential pattern mining using data from 304 patients. Along with patient covariates, the patterns were incorporated as features in logistic regression models to predict 2-, 6-, or 9-month residual survival at each visit. The modeling approach that included temporal patterns achieved test performances of 0.820, 0.785, and 0.783 area under the receiver operating characteristic curve for predicting 2-, 6-, and 9-month residual survival, respectively. This approach significantly outperformed models that used tumor volume alone (p < 0.001) or tumor volume combined with patient covariates (p < 0.001) in training. Temporal patterns involving an increase in tumor volume above 122 mm3/day, a decrease in KPS across multiple visits, moderate neurologic symptoms, and worsening overall neurologic function suggested lower residual survival. These patterns are readily interpretable and found to be consistent with known prognostic indicators, suggesting they can provide early indicators to clinicians of changes in patient state that inform management decisions.more » « less
-
Abstract PO-037: Biomarkers for early diagnosis of human papillomavirus-related oropharyngeal cancerAbstract Introduction: To counteract the rapidly increasing incidence of human papillomavirus (HPV)-related oropharyngeal cancer (OPC), development of effective biomarkers and other novel screening strategies are necessary to effectively detect early disease in well-defined high-risk populations. The most promising biomarkers are circulating tumor HPV DNA (ctHPV), antibodies (Abs) to HPV16 early (E) antigens, and persistent infection with oral oncogenic HPV. Here, we report the prevalence of ctHPV in a population of middle-aged men, a group with high OPC incidence, and evaluate concordance between the three HPV biomarkers. Materials and Methods: We included participants enrolled in the HPV-related Oropharyngeal and Uncommon Cancers Screening Trial of Men (HOUSTON) study between April 2017 and December 2019. The HOUSTON study was designed to evaluate biomarkers and novel screening strategies for HPV-related cancers among middle-aged men (50-64 years), the group with the highest incidence of HPV-related OPC. We tested plasma for ctHPV using a digital droplet polymerase chain reaction (ddPCR) assay (NavDx, Naveris, Waltham, MA). We previously tested the plasma from these participants for HPV16 E Abs using a novel RAPID ELISA and for prevalent oral HPV16 (oHPV16) infection in oral rinse using the cobas HPV Test (Roche Diagnostics, Indianapolis, IN). We used Fisher’s exact test to determine statistical significance for the association between biomarkers (alpha < 0.05). Results: Of 345 samples tested, 343 were adequate for ctHPV analysis. Of these, 314 were negative (92.4%) and two were positive (0.6%; both for HPV16). Twenty-four had an indeterminate (Ind) result (7.0%), meaning ctHPV levels fell outside the established parameters for a negative or positive results. All three markers were available for 340 samples with the following results: ctHPV+/Ab+/oHPV16+: 1 (0.3%); ctHPV+/Ab-/oHPV16-: 1 (0.3%); ctHPV Ind/Ab-/oHPV16+: 3 (0.9%); ctHPV Ind/Ab-/oHPV16-: 21 (6.2%); ctHPV-/Ab+/oHPV16+: 1 (0.3%); ctHPV-/Ab+/oHPV16-: 3 (0.9%); ctHPV-/Ab-/oHPV16+: 16 (4.7%); ctHPV-/Ab-/oHPV16-: 294 (86.5%); all other combinations had no observations. ctHPV was associated with HPV16 E Abs and oHPV16 status individually and combined (individually, p = 0.032 for both and combined, p = 0.025). A ctHPV-/Ab+/oHPV16-man was diagnosed with an anal low-grade squamous intraepithelial lesion and was persistently high-risk HPV-positive at the right tonsil/base of tongue. One man was positive for all three markers and was subsequently diagnosed with stage II (T1N1) HPV16-positive/Epstein-Barr-negative nasopharyngeal cancer four months following study enrollment. Conclusions: ctHPV was rare in a general population of middle-aged men. Our results suggest that these markers in combination may be able to correctly identify early HPV-related cancers. Larger studies are needed to confirm this finding. The authors accept sole responsibility for the statements in this abstract. Citation Format: Kristina R. Dahlstrom, Karen S. Anderson, Erich M. Sturgis. Biomarkers for early diagnosis of human papillomavirus-related oropharyngeal cancer [abstract]. In: Proceedings of the AACR-AHNS Head and Neck Cancer Conference: Innovating through Basic, Clinical, and Translational Research; 2023 Jul 7-8; Montreal, QC, Canada. Philadelphia (PA): AACR; Clin Cancer Res 2023;29(18_Suppl):Abstract nr PO-037.more » « less
-
Abstract BackgroundOropharyngeal cancer (OPC) exhibits varying responses to chemoradiation therapy, making treatment outcome prediction challenging. Traditional imaging‐based methods often fail to capture the spatial heterogeneity within tumors, which influences treatment resistance and disease progression. Advances in modeling techniques allow for more nuanced analysis of this heterogeneity, identifying distinct tumor regions, or habitats, that drive patient outcomes. PurposeTo interrogate the association between treatment‐induced changes in spatial heterogeneity and chemoradiation resistance of oropharyngeal cancer (OPC) based on a novel tumor habitat analysis. MethodsA mathematical model was used to estimate tumor time dynamics of patients with OPC based on the applied analysis of partial differential equations. The position and momentum of each voxel was propagated according to Fokker‐Planck dynamics, that is, a common model in statistical mechanics. The boundary conditions of the Fokker‐Planck equation were solved based on pre‐ and intra‐treatment (i.e., after 2 weeks of therapy)18F‐FDG‐PET SUV images of patients (n = 56) undergoing definitive (chemo)radiation for OPC as part of a previously conducted prospective clinical trial. Tumor‐specific time dynamics, measured based on the solution of the Fokker‐Planck equation, were generated for each patient. Tumor habitats (i.e., non‐overlapping subregions of the primary tumor) were identified by measuring vector similarity in voxel‐level time dynamics through a fuzzy c‐means clustering algorithm. The robustness of our habitat construction method was quantified using a mean silhouette metric to measure intra‐habitat variability. Fifty‐four habitat‐specific radiomic texture features were extracted from pre‐treatment SUV images and normalized by habitat volume. Univariate Kaplan‐Meier analyses were implemented as a feature selection method, where statistically significant features (p < 0.05, log‐rank) were used to construct a multivariate Cox proportional‐hazards model. Parameters from the resulting Cox model were then used to construct a risk score for each patient, based on habitat‐specific radiomic expression. The patient cohort was stratified by median risk score value and association with recurrence‐free survival (RFS) was evaluated via log‐rank tests. ResultsDynamic tumor habitat analysis partitioned the gross disease of each patient into three spatial subregions. Voxels within each habitat suggested differential response rates in different compartments of the tumor. The minimum mean silhouette value was 0.57 and maximum mean silhouette value was 0.8, where values above 0.7 indicated strong intra‐habitat consistency and values between 0.5 and 0.7 indicated reasonable intra‐habitat consistency. Nine radiomic texture features (three GLRLM, two GLCOM, and three GLSZM) and SUVmax were found to be prognostically significant and were used to build the multivariate Cox model. The resulting risk score was associated with RFS (p = 0.032). By contrast, potential confounding factors (primary tumor volume and mean SUV) were not significantly associated with RFS (p = 0.286 andp = 0.231, respectively). ConclusionWe interrogated spatial heterogeneity of oropharyngeal tumors through the application of a novel algorithm to identify spatial habitats on SUV images. Our habitat construction technique was shown to be robust and habitat‐specific feature spaces revealed distinct underlying radiomic expression patterns. Radiomic features were extracted from dynamic habitats and used to build a risk score which demonstrated prognostic value.more » « less
An official website of the United States government

