skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: Time-Aware Subgroup Matrix Decomposition: Imputing Missing Data Using Forecasting Events
Deep neural network models, especially Long Short Term Memory (LSTM), have shown great success in analyzing Electronic Health Records (EHRs) due to their ability to capture temporal dependencies in time series data. When applying the deep learning models to EHRs, we are generally confronted with two major challenges: high rate of missingness and time irregularity. Motivated by the original PACIFIER framework which utilized matrix decomposition for data imputation, we applied and further extended it by including three components: forecasting future events, a time-aware mechanism, and a subgroup basis approach. We evaluated the proposed framework with real-world EHRs which consists of 52,919 visits and 4,224,567 events on a task of early prediction of septic shock. We compared our work against multiple baselines including the original PACIFIER using both LSTM and Time-aware LSTM (T-LSTM). Experimental results showed that our proposed framework significantly outperformed all competitive baseline approaches. More importantly, the extracted interpretative latent patterns from subgroups could shed some lights for clinicians to discover the progression of septic shock patients.  more » « less
Award ID(s):
1651909
PAR ID:
10136459
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE International Conference on Big Data
ISSN:
2639-1589
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock. 
    more » « less
  2. Accurate predictions of water temperature are the foundation for many decisions and regulations, with direct impacts on water quality, fishery yields, and power production. Building accurate broad-scale models for lake temperature prediction remains challenging in practice due to the variability in the data distribution across different lake systems monitored by static and time-series data. In this paper, to tackle the above challenges, we propose a novel machine learning based approach for integrating static and time-series data in deep recurrent models, which we call Invertibility-Aware-Long Short-Term Memory(IA-LSTM), and demonstrate its effectiveness in predicting lake temperature. Our proposed method integrates components of the Invertible Network and LSTM to better predict temperature profiles (forward modeling) and infer the static features (i.e., inverse modeling) that can eventually enhance the prediction when static variables are missing. We evaluate our method on predicting the temperature profile of 450 lakes in the Midwestern U.S. and report a relative improvement of 4\% to capture data heterogeneity and simultaneously outperform baseline predictions by 12\% when static features are unavailable. 
    more » « less
  3. Abstract

    Septic shock is a life-threatening condition in which timely treatment substantially reduces mortality. Reliable identification of patients with sepsis who are at elevated risk of developing septic shock therefore has the potential to save lives by opening an early window of intervention. We hypothesize the existence of a novel clinical state of sepsis referred to as the “pre-shock” state, and that patients with sepsis who enter this state are highly likely to develop septic shock at some future time. We apply three different machine learning techniques to the electronic health record data of 15,930 patients in the MIMIC-III database to test this hypothesis. This novel paradigm yields improved performance in identifying patients with sepsis who will progress to septic shock, as defined by Sepsis- 3 criteria, with the best method achieving a 0.93 area under the receiver operating curve, 88% sensitivity, 84% specificity, and median early warning time of 7 hours. Additionally, we introduce the notion of patient-specific positive predictive value, assigning confidence to individual predictions, and achieving values as high as 91%. This study demonstrates that early prediction of impending septic shock, and thus early intervention, is possible many hours in advance.

     
    more » « less
  4. The process of matching patients with suitable clinical trials is essential for advancing medical research and providing optimal care. However, current approaches face challenges such as data standardization, ethical considerations, and a lack of interoperability between Electronic Health Records (EHRs) and clinical trial criteria. In this paper, we explore the potential of large language models (LLMs) to address these challenges by leveraging their advanced natural language generation capabilities to improve compatibility between EHRs and clinical trial descriptions. We propose an innovative privacy-aware data augmentation approach for LLM-based patient-trial matching (LLM-PTM), which balances the benefits of LLMs while ensuring the security and confidentiality of sensitive patient data. Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%. Additionally, we present case studies to further illustrate the effectiveness of our approach and provide a deeper understanding of its underlying principles. 
    more » « less
  5. Electronic healthcare records (EHRs) are comprehensive longitudinal collections of patient data that play a critical role in modeling the disease progression to facilitate clinical decision-making. Based on EHRs, in this work, we focus on sepsis – a broad syndrome that can develop from nearly all types of infections (e.g., influenza, pneumonia). The symptoms of sepsis, such as elevated heart rate, fever, and shortness of breath, are vague and common to other illnesses, making the modeling of its progression extremely challenging. Motivated by the recent success of a novel subsequence clustering approach: Toeplitz Inverse Covariance-based Clustering (TICC), we model the sepsis progression as a subsequence partitioning problem and propose a Multi-series Time-aware TICC (MT-TICC), which incorporates multi-series nature and irregular time intervals of EHRs. The effectiveness of MT-TICC is first validated via a case study using a real-world hand gesture dataset with ground-truth labels. Then we further apply it for sepsis progression modeling using EHRs. The results suggest that MT-TICC can significantly outperform competitive baseline models, including the TICC. More importantly, it unveils interpretable patterns, which sheds some light on better understanding the sepsis progression. 
    more » « less