skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Recent Advances in Predictive Modeling with Electronic Health Records
The development of electronic health records (EHR) systems has enabled the collection of a vast amount of digitized patient data. However, utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics. With the advancements in machine learning techniques, deep learning has demonstrated its superiority in various applications, including healthcare. This survey systematically reviews recent advances in deep learning-based predictive models using EHR data. Specifically, we introduce the background of EHR data and provide a mathematical definition of the predictive modeling task. We then categorize and summarize predictive deep models from multiple perspectives. Furthermore, we present benchmarks and toolkits relevant to predictive modeling in healthcare. Finally, we conclude this survey by discussing open challenges and suggesting promising directions for future research.  more » « less
Award ID(s):
2238275
PAR ID:
10598743
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
International Joint Conferences on Artificial Intelligence Organization
Date Published:
ISBN:
978-1-956792-04-1
Page Range / eLocation ID:
8272 to 8280
Format(s):
Medium: X
Location:
Jeju, South Korea
Sponsoring Org:
National Science Foundation
More Like this
  1. Frasch, Martin G. (Ed.)
    With the wider availability of healthcare data such as Electronic Health Records (EHR), more and more data-driven based approaches have been proposed to improve the quality-of-care delivery. Predictive modeling, which aims at building computational models for predicting clinical risk, is a popular research topic in healthcare analytics. However, concerns about privacy of healthcare data may hinder the development of effective predictive models that are generalizable because this often requires rich diverse data from multiple clinical institutions. Recently, federated learning (FL) has demonstrated promise in addressing this concern. However, data heterogeneity from different local participating sites may affect prediction performance of federated models. Due to acute kidney injury (AKI) and sepsis’ high prevalence among patients admitted to intensive care units (ICU), the early prediction of these conditions based on AI is an important topic in critical care medicine. In this study, we take AKI and sepsis onset risk prediction in ICU as two examples to explore the impact of data heterogeneity in the FL framework as well as compare performances across frameworks. We built predictive models based on local, pooled, and FL frameworks using EHR data across multiple hospitals. The local framework only used data from each site itself. The pooled framework combined data from all sites. In the FL framework, each local site did not have access to other sites’ data. A model was updated locally, and its parameters were shared to a central aggregator, which was used to update the federated model’s parameters and then subsequently, shared with each site. We found models built within a FL framework outperformed local counterparts. Then, we analyzed variable importance discrepancies across sites and frameworks. Finally, we explored potential sources of the heterogeneity within the EHR data. The different distributions of demographic profiles, medication use, and site information contributed to data heterogeneity. 
    more » « less
  2. In the field of healthcare, electronic health records (EHR) serve as crucial training data for developing machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform significantly worse in deployment for individuals of the minority classes compared to those from majority classes, which may lead to inequitable healthcare outcomes for minority groups. To address this challenge, we propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE), a novel approach to augment imbalanced datasets using samples generated by a deep generative model. The MCRAGE process involves training a Conditional Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes, which can be used to train less biased downstream models. We measure the performance of MCRAGE versus alternative approaches using Accuracy, F1 score and AUROC of these downstream models. We provide theoretical justification for our method in terms of recent convergence results for DDPMs. 
    more » « less
  3. Predictive modeling of clinical time series data is challenging due to various factors. One such difficulty is the existence of missing values, which leads to irregular data. Another challenge is capturing correlations across multiple dimensions in order to achieve accurate predictions. Additionally, it is essential to take into account the temporal structure, which includes both short-term and long-term recurrent patterns, to gain a comprehensive understanding of disease progression and to make accurate predictions for personalized healthcare. In critical situations, models that can make multi-step ahead predictions are essential for early detection. This review emphasizes the need for forecasting models that can effectively address the aforementioned challenges. The selection of models must also take into account the data-related constraints during the modeling process. Time series models can be divided into statistical, machine learning, and deep learning models. This review concentrates on the main models within these categories, discussing their capability to tackle the mentioned challenges. Furthermore, this paper provides a brief overview of a technique aimed at mitigating the limitations of a specific model to enhance its suitability for clinical prediction. It also explores ensemble forecasting methods designed to merge the strengths of various models while reducing their respective weaknesses, and finally discusses hierarchical models. Apart from the technical details provided in this document, there are certain aspects in predictive modeling research that have arisen as possible obstacles in implementing models using biomedical data. These obstacles are discussed leading to the future prospects of model building with artificial intelligence in healthcare domain. 
    more » « less
  4. Electronic health records (EHRs) have been heavily used in modern healthcare systems for recording patients' admission information to health facilities. Many data-driven approaches employ temporal features in EHR for predicting specific diseases, readmission times, and diagnoses of patients. However, most existing predictive models cannot fully utilize EHR data, due to an inherent lack of labels in supervised training for some temporal events. Moreover, it is hard for the existing methods to simultaneously provide generic and personalized interpretability. To address these challenges, we propose Sherbet, a self-supervised graph learning framework with hyperbolic embeddings for temporal health event prediction. We first propose a hyperbolic embedding method with information flow to pretrain medical code representations in a hierarchical structure. We incorporate these pretrained representations into a graph neural network (GNN) to detect disease complications and design a multilevel attention method to compute the contributions of particular diseases and admissions, thus enhancing personalized interpretability. We present a new hierarchy-enhanced historical prediction proxy task in our self-supervised learning framework to fully utilize EHR data and exploit medical domain knowledge. We conduct a comprehensive set of experiments on widely used publicly available EHR datasets to verify the effectiveness of our model. Our results demonstrate the proposed model's strengths in both predictive tasks and interpretable abilities. 
    more » « less
  5. null (Ed.)
    Electronic health record (EHR) systems have been widely adopted across healthcare organizations. While there are many benefits of using EHR such as improved accessibility and secure sharing of patient data, a shortcoming is that its manual data input is time-consuming and error prone. Physicians spend as much as 49.2% of their office time on EHR. In this paper, we present the design, development, and evaluation of a voice-based assistant, DocPal, to assist healthcare practitioners to access and update EHR through their voice. User survey and experimental evaluation illustrate that DocPal has good usability, time efficiency, and accuracy. When applied in the healthcare industry, we expect it to reduce data entry time and provide better patient care. 
    more » « less