Abstract BackgroundEffective diabetes management requires precise glycemic control to prevent both hypoglycemia and hyperglycemia, yet existing machine learning (ML) and reinforcement learning (RL) approaches often fail to balance competing objectives. Traditional RL-based glucose regulation systems primarily focus on single-objective optimization, overlooking factors such as minimizing insulin overuse, reducing glycemic variability, and ensuring patient safety. Furthermore, these approaches typically rely on centralized data processing, which raises privacy concerns due to the sensitive nature of health care data. There is a critical need for a decentralized, privacy-preserving framework that can personalize blood glucose regulation while addressing the multiobjective nature of diabetes management. ObjectiveThis study aimed to develop and validate PRIMO-FRL (Privacy-Preserving Reinforcement Learning for Individualized Multi-Objective Glycemic Management Using Federated Reinforcement Learning), a novel framework that optimizes clinical objectives—maximizing time in range (TIR), reducing hypoglycemia and hyperglycemia, and minimizing glycemic risk—while preserving patient privacy. MethodsWe developed PRIMO-FRL, integrating multiobjective reward shaping to dynamically balance glucose stability, insulin efficiency, and risk reduction. The model was trained and tested using simulated data from 30 simulated patients (10 children, 10 adolescents, and 10 adults) generated with the Food and Drug Administration (FDA)–approved UVA/Padova simulator. A comparative analysis was conducted against state-of-the-art RL and ML models, evaluating performance using metrics such as TIR, hypoglycemia (<70 mg/dL), hyperglycemia (>180 mg/dL), and glycemic risk scores. ResultsThe PRIMO-FRL model achieved a robust overall TIR of 76.54%, with adults demonstrating the highest TIR at 81.48%, followed by children at 77.78% and adolescents at 70.37%. Importantly, the approach eliminated hypoglycemia, with 0.0% spent below 70 mg/dL across all cohorts, significantly outperforming existing methods. Mild hyperglycemia (180-250 mg/dL) was observed in adolescents (29.63%), children (22.22%), and adults (18.52%), with adults exhibiting the best control. Furthermore, the PRIMO-FRL approach consistently reduced glycemic risk scores, demonstrating improved safety and long-term stability in glucose regulation.. ConclusionsOur findings highlight the potential of PRIMO-FRL as a transformative, privacy-preserving approach to personalized glycemic management. By integrating federated RL, this framework eliminates hypoglycemia, improves TIR, and preserves data privacy by decentralizing model training. Unlike traditional centralized approaches that require sharing sensitive health data, PRIMO-FRL leverages federated learning to keep patient data local, significantly reducing privacy risks while enabling adaptive and personalized glucose control. This multiobjective optimization strategy offers a scalable, secure, and clinically viable solution for real-world diabetes care. The ability to train personalized models across diverse populations without exposing raw data makes PRIMO-FRL well-suited for deployment in privacy-sensitive health care environments. These results pave the way for future clinical adoption, demonstrating the potential of privacy-preserving artificial intelligence in optimizing glycemic regulation while maintaining security, adaptability, and personalization.
more »
« less
An Explainable Artificial Intelligence Software Tool for Weight Management Experts (PRIMO): Mixed Methods Study
BackgroundPredicting the likelihood of success of weight loss interventions using machine learning (ML) models may enhance intervention effectiveness by enabling timely and dynamic modification of intervention components for nonresponders to treatment. However, a lack of understanding and trust in these ML models impacts adoption among weight management experts. Recent advances in the field of explainable artificial intelligence enable the interpretation of ML models, yet it is unknown whether they enhance model understanding, trust, and adoption among weight management experts. ObjectiveThis study aimed to build and evaluate an ML model that can predict 6-month weight loss success (ie, ≥7% weight loss) from 5 engagement and diet-related features collected over the initial 2 weeks of an intervention, to assess whether providing ML-based explanations increases weight management experts’ agreement with ML model predictions, and to inform factors that influence the understanding and trust of ML models to advance explainability in early prediction of weight loss among weight management experts. MethodsWe trained an ML model using the random forest (RF) algorithm and data from a 6-month weight loss intervention (N=419). We leveraged findings from existing explainability metrics to develop Prime Implicant Maintenance of Outcome (PRIMO), an interactive tool to understand predictions made by the RF model. We asked 14 weight management experts to predict hypothetical participants’ weight loss success before and after using PRIMO. We compared PRIMO with 2 other explainability methods, one based on feature ranking and the other based on conditional probability. We used generalized linear mixed-effects models to evaluate participants’ agreement with ML predictions and conducted likelihood ratio tests to examine the relationship between explainability methods and outcomes for nested models. We conducted guided interviews and thematic analysis to study the impact of our tool on experts’ understanding and trust in the model. ResultsOur RF model had 81% accuracy in the early prediction of weight loss success. Weight management experts were significantly more likely to agree with the model when using PRIMO (χ2=7.9; P=.02) compared with the other 2 methods with odds ratios of 2.52 (95% CI 0.91-7.69) and 3.95 (95% CI 1.50-11.76). From our study, we inferred that our software not only influenced experts’ understanding and trust but also impacted decision-making. Several themes were identified through interviews: preference for multiple explanation types, need to visualize uncertainty in explanations provided by PRIMO, and need for model performance metrics on similar participant test instances. ConclusionsOur results show the potential for weight management experts to agree with the ML-based early prediction of success in weight loss treatment programs, enabling timely and dynamic modification of intervention components to enhance intervention effectiveness. Our findings provide methods for advancing the understandability and trust of ML models among weight management experts.
more »
« less
- Award ID(s):
- 1910317
- PAR ID:
- 10585558
- Publisher / Repository:
- J Med Internet Res
- Date Published:
- Journal Name:
- Journal of Medical Internet Research
- Volume:
- 25
- ISSN:
- 1438-8871
- Page Range / eLocation ID:
- e42047
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract With increasing interest in explaining machine learning (ML) models, this paper synthesizes many topics related to ML explainability. We distinguish explainability from interpretability, local from global explainability, and feature importance versus feature relevance. We demonstrate and visualize different explanation methods, how to interpret them, and provide a complete Python package (scikit-explain) to allow future researchers and model developers to explore these explainability methods. The explainability methods include Shapley additive explanations (SHAP), Shapley additive global explanation (SAGE), and accumulated local effects (ALE). Our focus is primarily on Shapley-based techniques, which serve as a unifying framework for various existing methods to enhance model explainability. For example, SHAP unifies methods like local interpretable model-agnostic explanations (LIME) and tree interpreter for local explainability, while SAGE unifies the different variations of permutation importance for global explainability. We provide a short tutorial for explaining ML models using three disparate datasets: a convection-allowing model dataset for severe weather prediction, a nowcasting dataset for subfreezing road surface prediction, and satellite-based data for lightning prediction. In addition, we showcase the adverse effects that correlated features can have on the explainability of a model. Finally, we demonstrate the notion of evaluating model impacts of feature groups instead of individual features. Evaluating the feature groups mitigates the impacts of feature correlations and can provide a more holistic understanding of the model. All code, models, and data used in this study are freely available to accelerate the adoption of machine learning explainability in the atmospheric and other environmental sciences.more » « less
-
Abstract BackgroundIdiopathic pulmonary fibrosis (IPF) is a progressive, irreversible, and usually fatal lung disease of unknown reasons, generally affecting the elderly population. Early diagnosis of IPF is crucial for triaging patients’ treatment planning into anti‐fibrotic treatment or treatments for other causes of pulmonary fibrosis. However, current IPF diagnosis workflow is complicated and time‐consuming, which involves collaborative efforts from radiologists, pathologists, and clinicians and it is largely subject to inter‐observer variability. PurposeThe purpose of this work is to develop a deep learning‐based automated system that can diagnose subjects with IPF among subjects with interstitial lung disease (ILD) using an axial chest computed tomography (CT) scan. This work can potentially enable timely diagnosis decisions and reduce inter‐observer variability. MethodsOur dataset contains CT scans from 349 IPF patients and 529 non‐IPF ILD patients. We used 80% of the dataset for training and validation purposes and 20% as the holdout test set. We proposed a two‐stage model: at stage one, we built a multi‐scale, domain knowledge‐guided attention model (MSGA) that encouraged the model to focus on specific areas of interest to enhance model explainability, including both high‐ and medium‐resolution attentions; at stage two, we collected the output from MSGA and constructed a random forest (RF) classifier for patient‐level diagnosis, to further boost model accuracy. RF classifier is utilized as a final decision stage since it is interpretable, computationally fast, and can handle correlated variables. Model utility was examined by (1) accuracy, represented by the area under the receiver operating characteristic curve (AUC) with standard deviation (SD), and (2) explainability, illustrated by the visual examination of the estimated attention maps which showed the important areas for model diagnostics. ResultsDuring the training and validation stage, we observe that when we provide no guidance from domain knowledge, the IPF diagnosis model reaches acceptable performance (AUC±SD = 0.93±0.07), but lacks explainability; when including only guided high‐ or medium‐resolution attention, the learned attention maps are not satisfactory; when including both high‐ and medium‐resolution attention, under certain hyperparameter settings, the model reaches the highest AUC among all experiments (AUC±SD = 0.99±0.01) and the estimated attention maps concentrate on the regions of interests for this task. Three best‐performing hyperparameter selections according to MSGA were applied to the holdout test set and reached comparable model performance to that of the validation set. ConclusionsOur results suggest that, for a task with only scan‐level labels available, MSGA+RF can utilize the population‐level domain knowledge to guide the training of the network, which increases both model accuracy and explainability.more » « less
-
IntroductionAdvancements in machine learning (ML) algorithms that make predictions from data without being explicitly programmed and the increased computational speeds of graphics processing units (GPUs) over the last decade have led to remarkable progress in the capabilities of ML. In many fields, including agriculture, this progress has outpaced the availability of sufficiently diverse and high-quality datasets, which now serve as a limiting factor. While many agricultural use cases appear feasible with current compute resources and ML algorithms, the lack of reusable hardware and software components, referred to as cyberinfrastructure (CI), for collecting, transmitting, cleaning, labeling, and training datasets is a major hindrance toward developing solutions to address agricultural use cases. This study focuses on addressing these challenges by exploring the collection, processing, and training of ML models using a multimodal dataset and providing a vision for agriculture-focused CI to accelerate innovation in the field. MethodsData were collected during the 2023 growing season from three agricultural research locations across Ohio. The dataset includes 1 terabyte (TB) of multimodal data, comprising Unmanned Aerial System (UAS) imagery (RGB and multispectral), as well as soil and weather sensor data. The two primary crops studied were corn and soybean, which are the state's most widely cultivated crops. The data collected and processed from this study were used to train ML models to make predictions of crop growth stage, soil moisture, and final yield. ResultsThe exercise of processing this dataset resulted in four CI components that can be used to provide higher accuracy predictions in the agricultural domain. These components included (1) a UAS imagery pipeline that reduced processing time and improved image quality over standard methods, (2) a tabular data pipeline that aggregated data from multiple sources and temporal resolutions and aligned it with a common temporal resolution, (3) an approach to adapting the model architecture for a vision transformer (ViT) that incorporates agricultural domain expertise, and (4) a data visualization prototype that was used to identify outliers and improve trust in the data. DiscussionFurther work will be aimed at maturing the CI components and implementing them on high performance computing (HPC). There are open questions as to how CI components like these can best be leveraged to serve the needs of the agricultural community to accelerate the development of ML applications in agriculture.more » « less
-
IntroductionAI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. MethodsThis paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models—Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)—were used to validate the proposed approach. Results and discussionExperimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.more » « less
An official website of the United States government

