skip to main content


Title: Interpretable machine learning prediction of all-cause mortality
Abstract Background

Unlike linear models which are traditionally used to study all-cause mortality, complex machine learning models can capture non-linear interrelations and provide opportunities to identify unexplored risk factors. Explainable artificial intelligence can improve prediction accuracy over linear models and reveal great insights into outcomes like mortality. This paper comprehensively analyzes all-cause mortality by explaining complex machine learning models.

Methods

We propose the IMPACT framework that uses XAI technique to explain a state-of-the-art tree ensemble mortality prediction model. We apply IMPACT to understand all-cause mortality for 1-, 3-, 5-, and 10-year follow-up times within the NHANES dataset, which contains 47,261 samples and 151 features.

Results

We show that IMPACT models achieve higher accuracy than linear models and neural networks. Using IMPACT, we identify several overlooked risk factors and interaction effects. Furthermore, we identify relationships between laboratory features and mortality that may suggest adjusting established reference intervals. Finally, we develop highly accurate, efficient and interpretable mortality risk scores that can be used by medical professionals and individuals without medical expertise. We ensure generalizability by performing temporal validation of the mortality risk scores and external validation of important findings with the UK Biobank dataset.

Conclusions

IMPACT’s unique strength is the explainable prediction, which provides insights into the complex, non-linear relationships between mortality and features, while maintaining high accuracy. Our explainable risk scores could help individuals improve self-awareness of their health status and help clinicians identify patients with high risk. IMPACT takes a consequential step towards bringing contemporary developments in XAI to epidemiology.

 
more » « less
NSF-PAR ID:
10375953
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Communications Medicine
Volume:
2
Issue:
1
ISSN:
2730-664X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Introduction

    Between 5% and 20% of all combat-related casualties are attributed to burn wounds. A decrease in the mortality rate of burns by about 36% can be achieved with early treatment, but this is contingent upon accurate characterization of the burn. Precise burn injury classification is recognized as a crucial aspect of the medical artificial intelligence (AI) field. An autonomous AI system designed to analyze multiple characteristics of burns using modalities including ultrasound and RGB images is described.

    Materials and Methods

    A two-part dataset is created for the training and validation of the AI: in vivo B-mode ultrasound scans collected from porcine subjects (10,085 frames), and RGB images manually collected from web sources (338 images). The framework in use leverages an explanation system to corroborate and integrate burn expert’s knowledge, suggesting new features and ensuring the validity of the model. Through the utilization of this framework, it is discovered that B-mode ultrasound classifiers can be enhanced by supplying textural features. More specifically, it is confirmed that statistical texture features extracted from ultrasound frames can increase the accuracy of the burn depth classifier.

    Results

    The system, with all included features selected using explainable AI, is capable of classifying burn depth with accuracy and F1 average above 80%. Additionally, the segmentation module has been found capable of segmenting with a mean global accuracy greater than 84%, and a mean intersection-over-union score over 0.74.

    Conclusions

    This work demonstrates the feasibility of accurate and automated burn characterization for AI and indicates that these systems can be improved with additional features when a human expert is combined with explainable AI. This is demonstrated on real data (human for segmentation and porcine for depth classification) and establishes the groundwork for further deep-learning thrusts in the area of burn analysis.

     
    more » « less
  2. Abstract Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) the causal agent for COVID-19, is a communicable disease spread through close contact. It is known to disproportionately impact certain communities due to both biological susceptibility and inequitable exposure. In this study, we investigate the most important health, social, and environmental factors impacting the early phases (before July, 2020) of per capita COVID-19 transmission and per capita all-cause mortality in US counties. We aggregate county-level physical and mental health, environmental pollution, access to health care, demographic characteristics, vulnerable population scores, and other epidemiological data to create a large feature set to analyze per capita COVID-19 outcomes. Because of the high-dimensionality, multicollinearity, and unknown interactions of the data, we use ensemble machine learning and marginal prediction methods to identify the most salient factors associated with several COVID-19 outbreak measure. Our variable importance results show that measures of ethnicity, public transportation and preventable diseases are the strongest predictors for both per capita COVID-19 incidence and mortality. Specifically, the CDC measures for minority populations, CDC measures for limited English, and proportion of Black- and/or African-American individuals in a county were the most important features for per capita COVID-19 cases within a month after the pandemic started in a county and also at the latest date examined. For per capita all-cause mortality at day 100 and total to date, we find that public transportation use and proportion of Black- and/or African-American individuals in a county are the strongest predictors. The methods predict that, keeping all other factors fixed, a 10% increase in public transportation use, all other factors remaining fixed at the observed values, is associated with increases mortality at day 100 of 2012 individuals (95% CI [1972, 2356]) and likewise a 10% increase in the proportion of Black- and/or African-American individuals in a county is associated with increases total deaths at end of study of 2067 (95% CI [1189, 2654]). Using data until the end of study, the same metric suggests ethnicity has double the association as the next most important factors, which are location, disease prevalence, and transit factors. Our findings shed light on societal patterns that have been reported and experienced in the U.S. by using robust methods to understand the features most responsible for transmission and sectors of society most vulnerable to infection and mortality. In particular, our results provide evidence of the disproportionate impact of the COVID-19 pandemic on minority populations. Our results suggest that mitigation measures, including how vaccines are distributed, could have the greatest impact if they are given with priority to the highest risk communities. 
    more » « less
  3. Abstract

    Methods of explainable artificial intelligence (XAI) are used in geoscientific applications to gain insights into the decision-making strategy of neural networks (NNs), highlighting which features in the input contribute the most to a NN prediction. Here, we discuss our “lesson learned” that the task of attributing a prediction to the input does not have a single solution. Instead, the attribution results depend greatly on the considered baseline that the XAI method utilizes—a fact that has been overlooked in the geoscientific literature. The baseline is a reference point to which the prediction is compared so that the prediction can be understood. This baseline can be chosen by the user or is set by construction in the method’s algorithm—often without the user being aware of that choice. We highlight that different baselines can lead to different insights for different science questions and, thus, should be chosen accordingly. To illustrate the impact of the baseline, we use a large ensemble of historical and future climate simulations forced with the shared socioeconomic pathway 3-7.0 (SSP3-7.0) scenario and train a fully connected NN to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We then use various XAI methods and different baselines to attribute the network predictions to the input. We show that attributions differ substantially when considering different baselines, because they correspond to answering different science questions. We conclude by discussing important implications and considerations about the use of baselines in XAI research.

    Significance Statement

    In recent years, methods of explainable artificial intelligence (XAI) have found great application in geoscientific applications, because they can be used to attribute the predictions of neural networks (NNs) to the input and interpret them physically. Here, we highlight that the attributions—and the physical interpretation—depend greatly on the choice of the baseline—a fact that has been overlooked in the geoscientific literature. We illustrate this dependence for a specific climate task, in which a NN is trained to predict the ensemble- and global-mean temperature (i.e., the forced global warming signal) given an annual temperature map from an individual ensemble member. We show that attributions differ substantially when considering different baselines, because they correspond to answering different science questions.

     
    more » « less
  4. Objective Sudden unexpected death in epilepsy (SUDEP) is the leading cause of epilepsy-related mortality. Although lots of effort has been made in identifying clinical risk factors for SUDEP in the literature, there are few validated methods to predict individual SUDEP risk. Prolonged postictal EEG suppression (PGES) is a potential SUDEP biomarker, but its occurrence is infrequent and requires epilepsy monitoring unit admission. We use machine learning methods to examine SUDEP risk using interictal EEG and ECG recordings from SUDEP cases and matched living epilepsy controls. Methods This multicenter, retrospective, cohort study examined interictal EEG and ECG recordings from 30 SUDEP cases and 58 age-matched living epilepsy patient controls. We trained machine learning models with interictal EEG and ECG features to predict the retrospective SUDEP risk for each patient. We assessed cross-validated classification accuracy and the area under the receiver operating characteristic (AUC) curve. Results The logistic regression (LR) classifier produced the overall best performance, outperforming the support vector machine (SVM), random forest (RF), and convolutional neural network (CNN). Among the 30 patients with SUDEP [14 females; mean age (SD), 31 (8.47) years] and 58 living epilepsy controls [26 females (43%); mean age (SD) 31 (8.5) years], the LR model achieved the median AUC of 0.77 [interquartile range (IQR), 0.73–0.80] in five-fold cross-validation using interictal alpha and low gamma power ratio of the EEG and heart rate variability (HRV) features extracted from the ECG. The LR model achieved the mean AUC of 0.79 in leave-one-center-out prediction. Conclusions Our results support that machine learning-driven models may quantify SUDEP risk for epilepsy patients, future refinements in our model may help predict individualized SUDEP risk and help clinicians correlate predictive scores with the clinical data. Low-cost and noninvasive interictal biomarkers of SUDEP risk may help clinicians to identify high-risk patients and initiate preventive strategies. 
    more » « less
  5. Abstract Background Mortality research has identified biomarkers predictive of all-cause mortality risk. Most of these markers, such as body mass index, are predictive cross-sectionally, while for others the longitudinal change has been shown to be predictive, for instance greater-than-average muscle and weight loss in older adults. And while sometimes markers are derived from imaging modalities such as DXA, full scans are rarely used. This study builds on that knowledge and tests two hypotheses to improve all-cause mortality prediction. The first hypothesis is that features derived from raw total-body DXA imaging using deep learning are predictive of all-cause mortality with and without clinical risk factors, meanwhile, the second hypothesis states that sequential total-body DXA scans and recurrent neural network models outperform comparable models using only one observation with and without clinical risk factors. Methods Multiple deep neural network architectures were designed to test theses hypotheses. The models were trained and evaluated on data from the 16-year-long Health, Aging, and Body Composition Study including over 15,000 scans from over 3000 older, multi-race male and female adults. This study further used explainable AI techniques to interpret the predictions and evaluate the contribution of different inputs. Results The results demonstrate that longitudinal total-body DXA scans are predictive of all-cause mortality and improve performance of traditional mortality prediction models. On a held-out test set, the strongest model achieves an area under the receiver operator characteristic curve of 0.79. Conclusion This study demonstrates the efficacy of deep learning for the analysis of DXA medical imaging in a cross-sectional and longitudinal setting. By analyzing the trained deep learning models, this work also sheds light on what constitutes healthy aging in a diverse cohort. 
    more » « less