skip to main content


Title: A Mobile App That Addresses Interpretability Challenges in Machine Learning–Based Diabetes Predictions: Survey-Based User Study
Background

Machine learning approaches, including deep learning, have demonstrated remarkable effectiveness in the diagnosis and prediction of diabetes. However, these approaches often operate as opaque black boxes, leaving health care providers in the dark about the reasoning behind predictions. This opacity poses a barrier to the widespread adoption of machine learning in diabetes and health care, leading to confusion and eroding trust.

Objective

This study aimed to address this critical issue by developing and evaluating an explainable artificial intelligence (AI) platform, XAI4Diabetes, designed to empower health care professionals with a clear understanding of AI-generated predictions and recommendations for diabetes care. XAI4Diabetes not only delivers diabetes risk predictions but also furnishes easily interpretable explanations for complex machine learning models and their outcomes.

Methods

XAI4Diabetes features a versatile multimodule explanation framework that leverages machine learning, knowledge graphs, and ontologies. The platform comprises the following four essential modules: (1) knowledge base, (2) knowledge matching, (3) prediction, and (4) interpretation. By harnessing AI techniques, XAI4Diabetes forecasts diabetes risk and provides valuable insights into the prediction process and outcomes. A structured, survey-based user study assessed the app’s usability and influence on participants’ comprehension of machine learning predictions in real-world patient scenarios.

Results

A prototype mobile app was meticulously developed and subjected to thorough usability studies and satisfaction surveys. The evaluation study findings underscore the substantial improvement in medical professionals’ comprehension of key aspects, including the (1) diabetes prediction process, (2) data sets used for model training, (3) data features used, and (4) relative significance of different features in prediction outcomes. Most participants reported heightened understanding of and trust in AI predictions following their use of XAI4Diabetes. The satisfaction survey results further revealed a high level of overall user satisfaction with the tool.

Conclusions

This study introduces XAI4Diabetes, a versatile multi-model explainable prediction platform tailored to diabetes care. By enabling transparent diabetes risk predictions and delivering interpretable insights, XAI4Diabetes empowers health care professionals to comprehend the AI-driven decision-making process, thereby fostering transparency and trust. These advancements hold the potential to mitigate biases and facilitate the broader integration of AI in diabetes care.

 
more » « less
Award ID(s):
2218046 1722913
PAR ID:
10524600
Author(s) / Creator(s):
; ;
Publisher / Repository:
JMIR Formative Research
Date Published:
Journal Name:
JMIR Formative Research
Volume:
7
ISSN:
2561-326X
Page Range / eLocation ID:
e50328
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Unlike linear models which are traditionally used to study all-cause mortality, complex machine learning models can capture non-linear interrelations and provide opportunities to identify unexplored risk factors. Explainable artificial intelligence can improve prediction accuracy over linear models and reveal great insights into outcomes like mortality. This paper comprehensively analyzes all-cause mortality by explaining complex machine learning models.

    Methods

    We propose the IMPACT framework that uses XAI technique to explain a state-of-the-art tree ensemble mortality prediction model. We apply IMPACT to understand all-cause mortality for 1-, 3-, 5-, and 10-year follow-up times within the NHANES dataset, which contains 47,261 samples and 151 features.

    Results

    We show that IMPACT models achieve higher accuracy than linear models and neural networks. Using IMPACT, we identify several overlooked risk factors and interaction effects. Furthermore, we identify relationships between laboratory features and mortality that may suggest adjusting established reference intervals. Finally, we develop highly accurate, efficient and interpretable mortality risk scores that can be used by medical professionals and individuals without medical expertise. We ensure generalizability by performing temporal validation of the mortality risk scores and external validation of important findings with the UK Biobank dataset.

    Conclusions

    IMPACT’s unique strength is the explainable prediction, which provides insights into the complex, non-linear relationships between mortality and features, while maintaining high accuracy. Our explainable risk scores could help individuals improve self-awareness of their health status and help clinicians identify patients with high risk. IMPACT takes a consequential step towards bringing contemporary developments in XAI to epidemiology.

     
    more » « less
  2. Abstract Background

    Advanced machine learning models have received wide attention in assisting medical decision making due to the greater accuracy they can achieve. However, their limited interpretability imposes barriers for practitioners to adopt them. Recent advancements in interpretable machine learning tools allow us to look inside the black box of advanced prediction methods to extract interpretable models while maintaining similar prediction accuracy, but few studies have investigated the specific hospital readmission prediction problem with this spirit.

    Methods

    Our goal is to develop a machine-learning (ML) algorithm that can predict 30- and 90- day hospital readmissions as accurately as black box algorithms while providing medically interpretable insights into readmission risk factors. Leveraging a state-of-art interpretable ML model, we use a two-step Extracted Regression Tree approach to achieve this goal. In the first step, we train a black box prediction algorithm. In the second step, we extract a regression tree from the output of the black box algorithm that allows direct interpretation of medically relevant risk factors. We use data from a large teaching hospital in Asia to learn the ML model and verify our two-step approach.

    Results

    The two-step method can obtain similar prediction performance as the best black box model, such as Neural Networks, measured by three metrics: accuracy, the Area Under the Curve (AUC) and the Area Under the Precision-Recall Curve (AUPRC), while maintaining interpretability. Further, to examine whether the prediction results match the known medical insights (i.e., the model is truly interpretable and produces reasonable results), we show that key readmission risk factors extracted by the two-step approach are consistent with those found in the medical literature.

    Conclusions

    The proposed two-step approach yields meaningful prediction results that are both accurate and interpretable. This study suggests a viable means to improve the trust of machine learning based models in clinical practice for predicting readmissions through the two-step approach.

     
    more » « less
  3. Abstract Objective

    Prediction of mortality in intensive care unit (ICU) patients typically relies on black box models (that are unacceptable for use in hospitals) or hand-tuned interpretable models (that might lead to the loss in performance). We aim to bridge the gap between these 2 categories by building on modern interpretable machine learning (ML) techniques to design interpretable mortality risk scores that are as accurate as black boxes.

    Material and Methods

    We developed a new algorithm, GroupFasterRisk, which has several important benefits: it uses both hard and soft direct sparsity regularization, it incorporates group sparsity to allow more cohesive models, it allows for monotonicity constraint to include domain knowledge, and it produces many equally good models, which allows domain experts to choose among them. For evaluation, we leveraged the largest existing public ICU monitoring datasets (MIMIC III and eICU).

    Results

    Models produced by GroupFasterRisk outperformed OASIS and SAPS II scores and performed similarly to APACHE IV/IVa while using at most a third of the parameters. For patients with sepsis/septicemia, acute myocardial infarction, heart failure, and acute kidney failure, GroupFasterRisk models outperformed OASIS and SOFA. Finally, different mortality prediction ML approaches performed better based on variables selected by GroupFasterRisk as compared to OASIS variables.

    Discussion

    Group Faster Risk’s models performed better than risk scores currently used in hospitals, and on par with black box ML models, while being orders of magnitude sparser. Because GroupFasterRisk produces a variety of risk scores, it allows design flexibility—the key enabler of practical model creation.

    Conclusion

    Group Faster Risk is a fast, accessible, and flexible procedure that allows learning a diverse set of sparse risk scores for mortality prediction.

     
    more » « less
  4. OBJECTIVE

    Current clinical guidelines for managing type 2 diabetes do not differentiate based on patient-specific factors. We present a data-driven algorithm for personalized diabetes management that improves health outcomes relative to the standard of care.

    RESEARCH DESIGN AND METHODS

    We modeled outcomes under 13 pharmacological therapies based on electronic medical records from 1999 to 2014 for 10,806 patients with type 2 diabetes from Boston Medical Center. For each patient visit, we analyzed the range of outcomes under alternative care using a k-nearest neighbor approach. The neighbors were chosen to maximize similarity on individual patient characteristics and medical history that were most predictive of health outcomes. The recommendation algorithm prescribes the regimen with best predicted outcome if the expected improvement from switching regimens exceeds a threshold. We evaluated the effect of recommendations on matched patient outcomes from unseen data.

    RESULTS

    Among the 48,140 patient visits in the test set, the algorithm’s recommendation mirrored the observed standard of care in 68.2% of visits. For patient visits in which the algorithmic recommendation differed from the standard of care, the mean posttreatment glycated hemoglobin A1c (HbA1c) under the algorithm was lower than standard of care by 0.44 ± 0.03% (4.8 ± 0.3 mmol/mol) (P < 0.001), from 8.37% under the standard of care to 7.93% under our algorithm (68.0 to 63.2 mmol/mol).

    CONCLUSIONS

    A personalized approach to diabetes management yielded substantial improvements in HbA1c outcomes relative to the standard of care. Our prototyped dashboard visualizing the recommendation algorithm can be used by providers to inform diabetes care and improve outcomes.

     
    more » « less
  5. Abstract Objective

    The use of electronic health records (EHRs) for clinical risk prediction is on the rise. However, in many practical settings, the limited availability of task-specific EHR data can restrict the application of standard machine learning pipelines. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks.

    Methods

    We propose two novel LM-based methods, namely “LLaMA2-EHR” and “Sent-e-Med.” Our focus is on utilizing the textual descriptions within structured EHRs to make risk predictions about future diagnoses. We conduct a comprehensive comparison with previous approaches across various data types and sizes.

    Results

    Experiments across 6 different methods and 3 separate risk prediction tasks reveal that employing LMs to represent structured EHRs, such as diagnostic histories, results in significant performance improvements when evaluated using standard metrics such as area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve. Additionally, they offer benefits such as few-shot learning, the ability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. However, it is noteworthy that outcomes may exhibit sensitivity to a specific prompt.

    Conclusion

    LMs encompass extensive embedded knowledge, making them valuable for the analysis of EHRs in the context of risk prediction. Nevertheless, it is important to exercise caution in their application, as ongoing safety concerns related to LMs persist and require continuous consideration.

     
    more » « less