skip to main content


Title: Multi-series Time-aware Sequence Partitioning for Disease Progression Modeling.
Electronic healthcare records (EHRs) are comprehensive longitudinal collections of patient data that play a critical role in modeling the disease progression to facilitate clinical decision-making. Based on EHRs, in this work, we focus on sepsis – a broad syndrome that can develop from nearly all types of infections (e.g., influenza, pneumonia). The symptoms of sepsis, such as elevated heart rate, fever, and shortness of breath, are vague and common to other illnesses, making the modeling of its progression extremely challenging. Motivated by the recent success of a novel subsequence clustering approach: Toeplitz Inverse Covariance-based Clustering (TICC), we model the sepsis progression as a subsequence partitioning problem and propose a Multi-series Time-aware TICC (MT-TICC), which incorporates multi-series nature and irregular time intervals of EHRs. The effectiveness of MT-TICC is first validated via a case study using a real-world hand gesture dataset with ground-truth labels. Then we further apply it for sepsis progression modeling using EHRs. The results suggest that MT-TICC can significantly outperform competitive baseline models, including the TICC. More importantly, it unveils interpretable patterns, which sheds some light on better understanding the sepsis progression.  more » « less
Award ID(s):
1916417
NSF-PAR ID:
10324504
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI)
Page Range / eLocation ID:
3581-3587
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modeling patient disease progression using Electronic Health Records (EHRs) is critical to assist clinical decision making. Long-Short Term Memory (LSTM) is an effective model to handle sequential data, such as EHRs, but it encounters two major limitations when applied to EHRs: it is unable to interpret the prediction results and it ignores the irregular time intervals between consecutive events. To tackle these limitations, we propose an attention-based time-aware LSTM Networks (ATTAIN), to improve the interpretability of LSTM and to identify the critical previous events for current diagnosis by modeling the inherent time irregularity. We validate ATTAIN on modeling the progression of an extremely challenging disease, septic shock, by using real-world EHRs. Our results demonstrate that the proposed framework outperforms the state-of-the-art models such as RETAIN and T-LSTM. Also, the generated interpretative time-aware attention weights shed some lights on the progression behaviors of septic shock. 
    more » « less
  2. Abstract

    Timely and accurate referral of end-stage heart failure patients for advanced therapies, including heart transplants and mechanical circulatory support, plays an important role in improving patient outcomes and saving costs. However, the decision-making process is complex, nuanced, and time-consuming, requiring cardiologists with specialized expertise and training in heart failure and transplantation.

    In this study, we propose two logistic tensor regression-based models to predict patients with heart failure warranting evaluation for advanced heart failure therapies using irregularly spaced sequential electronic health records at the population and individual levels. The clinical features were collected at the previous visit and the predictions were made at the very beginning of the subsequent visit. Patient-wise ten-fold cross-validation experiments were performed. Standard LTR achieved an average F1 score of 0.708, AUC of 0.903, and AUPRC of 0.836. Personalized LTR obtained an F1 score of 0.670, an AUC of 0.869 and an AUPRC of 0.839. The two models not only outperformed all other machine learning models to which they were compared but also improved the performance and robustness of the other models via weight transfer. The AUPRC scores of support vector machine, random forest, and Naive Bayes are improved by 8.87%, 7.24%, and 11.38%, respectively.

    The two models can evaluate the importance of clinical features associated with advanced therapy referral. The five most important medical codes, including chronic kidney disease, hypotension, pulmonary heart disease, mitral regurgitation, and atherosclerotic heart disease, were reviewed and validated with literature and by heart failure cardiologists. Our proposed models effectively utilize EHRs for potential advanced therapies necessity in heart failure patients while explaining the importance of comorbidities and other clinical events. The information learned from trained model training could offer further insight into risk factors contributing to the progression of heart failure at both the population and individual levels.

     
    more » « less
  3. Abstract Overly restrictive eligibility criteria for clinical trials may limit the generalizability of the trial results to their target real-world patient populations. We developed a novel machine learning approach using large collections of real-world data (RWD) to better inform clinical trial eligibility criteria design. We extracted patients’ clinical events from electronic health records (EHRs), which include demographics, diagnoses, and drugs, and assumed certain compositions of these clinical events within an individual’s EHRs can determine the subphenotypes—homogeneous clusters of patients, where patients within each subgroup share similar clinical characteristics. We introduced an outcome-guided probabilistic model to identify those subphenotypes, such that the patients within the same subgroup not only share similar clinical characteristics but also at similar risk levels of encountering severe adverse events (SAEs). We evaluated our algorithm on two previously conducted clinical trials with EHRs from the OneFlorida+ Clinical Research Consortium. Our model can clearly identify the patient subgroups who are more likely to suffer or not suffer from SAEs as subphenotypes in a transparent and interpretable way. Our approach identified a set of clinical topics and derived novel patient representations based on them. Each clinical topic represents a certain clinical event composition pattern learned from the patient EHRs. Tested on both trials, patient subgroup (#SAE=0) and patient subgroup (#SAE>0) can be well-separated by k-means clustering using the inferred topics. The inferred topics characterized as likely to align with the patient subgroup (#SAE>0) revealed meaningful combinations of clinical features and can provide data-driven recommendations for refining the exclusion criteria of clinical trials. The proposed supervised topic modeling approach can infer the clinical topics from the subphenotypes with or without SAEs. The potential rules for describing the patient subgroups with SAEs can be further derived to inform the design of clinical trial eligibility criteria. 
    more » « less
  4. Deep neural network models, especially Long Short Term Memory (LSTM), have shown great success in analyzing Electronic Health Records (EHRs) due to their ability to capture temporal dependencies in time series data. When applying the deep learning models to EHRs, we are generally confronted with two major challenges: high rate of missingness and time irregularity. Motivated by the original PACIFIER framework which utilized matrix decomposition for data imputation, we applied and further extended it by including three components: forecasting future events, a time-aware mechanism, and a subgroup basis approach. We evaluated the proposed framework with real-world EHRs which consists of 52,919 visits and 4,224,567 events on a task of early prediction of septic shock. We compared our work against multiple baselines including the original PACIFIER using both LSTM and Time-aware LSTM (T-LSTM). Experimental results showed that our proposed framework significantly outperformed all competitive baseline approaches. More importantly, the extracted interpretative latent patterns from subgroups could shed some lights for clinicians to discover the progression of septic shock patients. 
    more » « less
  5. Abstract Objective Severe infection can lead to organ dysfunction and sepsis. Identifying subphenotypes of infected patients is essential for personalized management. It is unknown how different time series clustering algorithms compare in identifying these subphenotypes. Materials and Methods Patients with suspected infection admitted between 2014 and 2019 to 4 hospitals in Emory healthcare were included, split into separate training and validation cohorts. Dynamic time warping (DTW) was applied to vital signs from the first 8 h of hospitalization, and hierarchical clustering (DTW-HC) and partition around medoids (DTW-PAM) were used to cluster patients into subphenotypes. DTW-HC, DTW-PAM, and a previously published group-based trajectory model (GBTM) were evaluated for agreement in subphenotype clusters, trajectory patterns, and subphenotype associations with clinical outcomes and treatment responses. Results There were 12 473 patients in training and 8256 patients in validation cohorts. DTW-HC, DTW-PAM, and GBTM models resulted in 4 consistent vitals trajectory patterns with significant agreement in clustering (71–80% agreement, P < .001): group A was hyperthermic, tachycardic, tachypneic, and hypotensive. Group B was hyperthermic, tachycardic, tachypneic, and hypertensive. Groups C and D had lower temperatures, heart rates, and respiratory rates, with group C normotensive and group D hypotensive. Group A had higher odds ratio of 30-day inpatient mortality (P < .01) and group D had significant mortality benefit from balanced crystalloids compared to saline (P < .01) in all 3 models. Discussion DTW- and GBTM-based clustering algorithms applied to vital signs in infected patients identified consistent subphenotypes with distinct clinical outcomes and treatment responses. Conclusion Time series clustering with distinct computational approaches demonstrate similar performance and significant agreement in the resulting subphenotypes. 
    more » « less