We propose a prognostic machine learning (ML) framework to support the behavioural outcome prediction for cancer survivors. Specifically, our contributions are four-fold: (1) devise a data-driven, clinical domain guided pipeline to select the best set of predictors among cancer treatments, chronic health conditions, and socio-environmental factors to perform behavioural outcome predictions; (2) use the state-of-the-art two-tier ensemble-based technique to select the best set of predictors for the downstream ML regressor constructions; (3) develop a StackNet Regressor Architecture (SRA) algorithm, i.e., an intelligent meta-modeling algorithm, to dynamically and automatically build an optimized multilayer ensemble-based RA from a given set of ML regressors to predict long-term behavioural outcomes; and (4) conduct a preliminarily experimental case study on our existing study data (i.e., 207 cancer survivors who suffered from either Osteogenic Sarcoma, Soft Tissue Sarcomas, or Acute Lymphoblastic Leukemia before the age of 18) collected by our investigators in a public hospital in Hong Kong. In this pilot study, we demonstrate that our approach outperforms the traditional statistical and computation methods, including Linear and non-Linear ML regressors.
more »
« less
A Hybrid Deep Learning–Based Feature Selection Approach for Supporting Early Detection of Long-Term Behavioral Outcomes in Survivors of Cancer: Cross-Sectional Study
Background: The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments. Objective: This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer. Methods: We devised a hybrid deep learning–based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain–guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals’ future treatment and diagnoses. Results: In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F1, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia. Conclusions: Our novel feature selection algorithm has the potential to improve machine learning classifiers’ capability to predict adverse long-term behavioral outcomes in survivors of cancer.
more »
« less
- Award ID(s):
- 2349370
- PAR ID:
- 10641406
- Publisher / Repository:
- JMIR Publications Inc., Toronto, Canada
- Date Published:
- Journal Name:
- JMIR Bioinformatics and Biotechnology
- Volume:
- 6
- ISSN:
- 2563-3570
- Page Range / eLocation ID:
- e65001
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Studies over the past decade have generated a wealth of molecular data that can be leveraged to better understand cancer risk, progression, and outcomes. However, understanding the progression risk and differentiating long- and short-term survivors cannot be achieved by analyzing data from a single modality due to the heterogeneity of disease. Using a scientifically developed and tested deep-learning approach that leverages aggregate information collected from multiple repositories with multiple modalities (e.g., mRNA, DNA Methylation, miRNA) could lead to a more accurate and robust prediction of disease progression. Here, we propose an autoencoder based multimodal data fusion system, in which a fusion encoder flexibly integrates collective information available through multiple studies with partially coupled data. Our results on a fully controlled simulation-based study have shown that inferring the missing data through the proposed data fusion pipeline allows a predictor that is superior to other baseline predictors with missing modalities. Results have further shown that short- and long-term survivors of glioblastoma multiforme, acute myeloid leukemia, and pancreatic adenocarcinoma can be successfully differentiated with an AUC of 0.94, 0.75, and 0.96, respectively.more » « less
-
Abstract The recurrence of cancer following chemotherapy treatment is a major cause of death across solid and hematologic cancers. In B-cell acute lymphoblastic leukemia (B-ALL), relapse after initial chemotherapy treatment leads to poor patient outcomes. Here we test the hypothesis that chemotherapy-treated versus control B-ALL cells can be characterized based on cellular physical phenotypes. To quantify physical phenotypes of chemotherapy-treated leukemia cells, we use cells derived from B-ALL patients that are treated for 7 days with a standard multidrug chemotherapy regimen of vincristine, dexamethasone, and L-asparaginase (VDL). We conduct physical phenotyping of VDL-treated versus control cells by tracking the sequential deformations of single cells as they flow through a series of micron-scale constrictions in a microfluidic device; we call this method Quantitative Cyclical Deformability Cytometry. Using automated image analysis, we extract time-dependent features of deforming cells including cell size and transit time (TT) with single-cell resolution. Our findings show that VDL-treated B-ALL cells have faster TTs and transit velocity than control cells, indicating that VDL-treated cells are more deformable. We then test how effectively physical phenotypes can predict the presence of VDL-treated cells in mixed populations of VDL-treated and control cells using machine learning approaches. We find that TT measurements across a series of sequential constrictions can enhance the classification accuracy of VDL-treated cells in mixed populations using a variety of classifiers. Our findings suggest the predictive power of cell physical phenotyping as a complementary prognostic tool to detect the presence of cells that survive chemotherapy treatment. Ultimately such complementary physical phenotyping approaches could guide treatment strategies and therapeutic interventions. Insight box Cancer cells that survive chemotherapy treatment are major contributors to patient relapse, but the ability to predict recurrence remains a challenge. Here we investigate the physical properties of leukemia cells that survive treatment with chemotherapy drugs by deforming individual cells through a series of micron-scale constrictions in a microfluidic channel. Our findings reveal that leukemia cells that survive chemotherapy treatment are more deformable than control cells. We further show that machine learning algorithms applied to physical phenotyping data can predict the presence of cells that survive chemotherapy treatment in a mixed population. Such an integrated approach using physical phenotyping and machine learning could be valuable to guide patient treatments.more » « less
-
Background: Datasets on rare diseases, like pediatric acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL), have small sample sizes that hinder machine learning (ML). The objective was to develop an interpretable ML framework to elucidate actionable insights from small tabular rare disease datasets. Methods: The comprehensive framework employed optimized data imputation and sampling, supervised and unsupervised learning, and literature-based discovery (LBD). The framework was deployed to assess treatment-related infection in pediatric AML and ALL. Results: An interpretable decision tree classified the risk of infection as either “high risk” or “low risk” in pediatric ALL (n = 580) and AML (n = 132) with accuracy of ∼79%. Interpretable regression models predicted the discrete number of developed infections with a mean absolute error (MAE) of 2.26 for bacterial infections and an MAE of 1.29 for viral infections. Features that best explained the development of infection were the chemotherapy regimen, cancer cells in the central nervous system at initial diagnosis, chemotherapy course, leukemia type, Down syndrome, race, and National Cancer Institute risk classification. Finally, SemNet 2.0, an open-source LBD software that links relationships from 33+ million PubMed articles, identified additional features for the prediction of infection, like glucose, iron, neutropenia-reducing growth factors, and systemic lupus erythematosus (SLE). Conclusions: The developed ML framework enabled state-of-the-art, interpretable predictions using rare disease tabular datasets. ML model performance baselines were successfully produced to predict infection in pediatric AML and ALL.more » « less
-
Abstract Autism Spectrum Disorder (ASD) is characterized as a neurodevelopmental disorder with a heterogeneous nature, influenced by genetics and exhibiting diverse clinical presentations. In this study, we dissect Autism Spectrum Disorder (ASD) into its behavioral components, mirroring the diagnostic process used in clinical settings. Morphological features are extracted from magnetic resonance imaging (MRI) scans, found in the publicly available dataset ABIDE II, identifying the most discriminative features that differentiate ASD within various behavioral domains. Then, each subject is categorized as having severe, moderate, or mild ASD, or typical neurodevelopment (TD), based on the behavioral domains of the Social Responsiveness Scale (SRS). Through this study, multiple artificial intelligence (AI) models are utilized for feature selection and classifying each ASD severity and behavioural group. A multivariate feature selection algorithm, investigating four different classifiers with linear and non-linear hypotheses, is applied iteratively while shuffling the training-validation subjects to find the set of cortical regions with statistically significant association with ASD. A set of six classifiers are optimized and trained on the selected set of features using 5-fold cross-validation for the purpose of severity classification for each behavioural group. Our AI-based model achieved an average accuracy of 96%, computed as the mean accuracy across the top-performing AI models for feature selection and severity classification across the different behavioral groups. The proposed AI model has the ability to accurately differentiate between the functionalities of specific brain regions, such as the left and right caudal middle frontal regions. We propose an AI-based model that dissects ASD into behavioral components. For each behavioral component, the AI-based model is capable of identifying the brain regions which are associated with ASD as well as utilizing those regions for diagnosis. The proposed system can increase the speed and accuracy of the diagnostic process and result in improved outcomes for individuals with ASD, highlighting the potential of AI in this area.more » « less
An official website of the United States government

