ABSTRACT Understanding clinical trajectories of sepsis patients is crucial for prognostication, resource planning, and to inform digital twin models of critical illness. This study aims to identify common clinical trajectories based on dynamic assessment of cardiorespiratory support using a validated electronic health record data that covers retrospective cohort of 19,177 patients with sepsis admitted to intensive care units (ICUs) of Mayo Clinic Hospitals over 8-year period. Patient trajectories were modeled from ICU admission up to 14 days using an unsupervised machine learning two-stage clustering method based on cardiorespiratory support in ICU and hospital discharge status. Of 19,177 patients, 42% were female with a median age of 65 (interquartile range [IQR], 55–76) years, The Acute Physiology, Age, and Chronic Health Evaluation III score of 70 (IQR, 56–87), hospital length of stay (LOS) of 7 (IQR, 4–12) days, and ICU LOS of 2 (IQR, 1–4) days. Four distinct trajectories were identified: fast recovery (27% with a mortality rate of 3.5% and median hospital LOS of 3 (IQR, 2–15) days), slow recovery (62% with a mortality rate of 3.6% and hospital LOS of 8 (IQR, 6–13) days), fast decline (4% with a mortality rate of 99.7% and hospital LOS of 1 (IQR, 0–1) day), and delayed decline (7% with a mortality rate of 97.9% and hospital LOS of 5 (IQR, 3–8) days). Distinct trajectories remained robust and were distinguished by Charlson Comorbidity Index, The Acute Physiology, Age, and Chronic Health Evaluation III scores, as well as day 1 and day 3 SOFA (P< 0.001 ANOVA). These findings provide a foundation for developing prediction models and digital twin decision support tools, improving both shared decision making and resource planning. 
                        more » 
                        « less   
                    
                            
                            Generalizability challenges of mortality risk prediction models: A retrospective analysis on a multi-center database
                        
                    
    
            Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1845487
- PAR ID:
- 10321157
- Editor(s):
- Pollard, Tom J.
- Date Published:
- Journal Name:
- PLOS Digital Health
- Volume:
- 1
- Issue:
- 4
- ISSN:
- 2767-3170
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Background Early diagnosis is essential for effective stroke therapy. Strokes in hospitalized patients are associated with worse outcomes compared with strokes in the community. We derived and validated an algorithm to identify strokes by monitoring upper limb movements in hospitalized patients. Methods and Results A prospective case–control study in hospitalized patients evaluated bilateral arm accelerometry from patients with acute stroke with lateralized weakness and controls without stroke. We derived a stroke classifier algorithm from 123 controls and 77 acute stroke cases and then validated the performance in a separate cohort of 167 controls and 33 acute strokes, measuring false alarm rates in nonstroke controls and time to detection in stroke cases. Faster detection time was associated with more false alarms. With a median false alarm rate among nonstroke controls of 3.6 (interquartile range [IQR], 2.1–5.0) alarms per patient per day, the median time to detection was 15.0 (IQR, 8.0–73.5) minutes. A median false alarm rate of 1.1 (IQR. 0–2.2) per patient per day was associated with a median time to stroke detection of 29.0 (IQR, 11.0–58.0) minutes. There were no differences in algorithm performance for subgroups dichotomized by age, sex, race, handedness, nondominant hemisphere involvement, intensive care unit versus ward, or daytime versus nighttime. Conclusions Arm movement data can be used to detect asymmetry indicative of stroke in hospitalized patients with a low false alarm rate. Additional studies are needed to demonstrate clinical usefulness.more » « less
- 
            BackgroundAlthough conventional prediction models for surgical patients often ignore intraoperative time-series data, deep learning approaches are well-suited to incorporate time-varying and non-linear data with complex interactions. Blood lactate concentration is one important clinical marker that can reflect the adequacy of systemic perfusion during cardiac surgery. During cardiac surgery and cardiopulmonary bypass, minute-level data is available on key parameters that affect perfusion. The goal of this study was to use machine learning and deep learning approaches to predict maximum blood lactate concentrations after cardiac surgery. We hypothesized that models using minute-level intraoperative data as inputs would have the best predictive performance. MethodsAdults who underwent cardiac surgery with cardiopulmonary bypass were eligible. The primary outcome was maximum lactate concentration within 24 h postoperatively. We considered three classes of predictive models, using the performance metric of mean absolute error across testing folds: (1) static models using baseline preoperative variables, (2) augmentation of the static models with intraoperative statistics, and (3) a dynamic approach that integrates preoperative variables with intraoperative time series data. Results2,187 patients were included. For three models that only used baseline characteristics (linear regression, random forest, artificial neural network) to predict maximum postoperative lactate concentration, the prediction error ranged from a median of 2.52 mmol/L (IQR 2.46, 2.56) to 2.58 mmol/L (IQR 2.54, 2.60). The inclusion of intraoperative summary statistics (including intraoperative lactate concentration) improved model performance, with the prediction error ranging from a median of 2.09 mmol/L (IQR 2.04, 2.14) to 2.12 mmol/L (IQR 2.06, 2.16). For two modelling approaches (recurrent neural network, transformer) that can utilize intraoperative time-series data, the lowest prediction error was obtained with a range of median 1.96 mmol/L (IQR 1.87, 2.05) to 1.97 mmol/L (IQR 1.92, 2.05). Intraoperative lactate concentration was the most important predictive feature based on Shapley additive values. Anemia and weight were also important predictors, but there was heterogeneity in the importance of other features. ConclusionPostoperative lactate concentrations can be predicted using baseline and intraoperative data with moderate accuracy. These results reflect the value of intraoperative data in the prediction of clinically relevant outcomes to guide perioperative management.more » « less
- 
            Objective Sudden unexpected death in epilepsy (SUDEP) is the leading cause of epilepsy-related mortality. Although lots of effort has been made in identifying clinical risk factors for SUDEP in the literature, there are few validated methods to predict individual SUDEP risk. Prolonged postictal EEG suppression (PGES) is a potential SUDEP biomarker, but its occurrence is infrequent and requires epilepsy monitoring unit admission. We use machine learning methods to examine SUDEP risk using interictal EEG and ECG recordings from SUDEP cases and matched living epilepsy controls. Methods This multicenter, retrospective, cohort study examined interictal EEG and ECG recordings from 30 SUDEP cases and 58 age-matched living epilepsy patient controls. We trained machine learning models with interictal EEG and ECG features to predict the retrospective SUDEP risk for each patient. We assessed cross-validated classification accuracy and the area under the receiver operating characteristic (AUC) curve. Results The logistic regression (LR) classifier produced the overall best performance, outperforming the support vector machine (SVM), random forest (RF), and convolutional neural network (CNN). Among the 30 patients with SUDEP [14 females; mean age (SD), 31 (8.47) years] and 58 living epilepsy controls [26 females (43%); mean age (SD) 31 (8.5) years], the LR model achieved the median AUC of 0.77 [interquartile range (IQR), 0.73–0.80] in five-fold cross-validation using interictal alpha and low gamma power ratio of the EEG and heart rate variability (HRV) features extracted from the ECG. The LR model achieved the mean AUC of 0.79 in leave-one-center-out prediction. Conclusions Our results support that machine learning-driven models may quantify SUDEP risk for epilepsy patients, future refinements in our model may help predict individualized SUDEP risk and help clinicians correlate predictive scores with the clinical data. Low-cost and noninvasive interictal biomarkers of SUDEP risk may help clinicians to identify high-risk patients and initiate preventive strategies.more » « less
- 
            Abstract ObjectiveTo develop predictive models of coronavirus disease 2019 (COVID-19) outcomes, elucidate the influence of socioeconomic factors, and assess algorithmic racial fairness using a racially diverse patient population with high social needs. Materials and MethodsData included 7,102 patients with positive (RT-PCR) severe acute respiratory syndrome coronavirus 2 test at a safety-net system in Massachusetts. Linear and nonlinear classification methods were applied. A score based on a recurrent neural network and a transformer architecture was developed to capture the dynamic evolution of vital signs. Combined with patient characteristics, clinical variables, and hospital occupancy measures, this dynamic vital score was used to train predictive models. ResultsHospitalizations can be predicted with an area under the receiver-operating characteristic curve (AUC) of 92% using symptoms, hospital occupancy, and patient characteristics, including social determinants of health. Parsimonious models to predict intensive care, mechanical ventilation, and mortality that used the most recent labs and vitals exhibited AUCs of 92.7%, 91.2%, and 94%, respectively. Early predictive models, using labs and vital signs closer to admission had AUCs of 81.1%, 84.9%, and 92%, respectively. DiscussionThe most accurate models exhibit racial bias, being more likely to falsely predict that Black patients will be hospitalized. Models that are only based on the dynamic vital score exhibited accuracies close to the best parsimonious models, although the latter also used laboratories. ConclusionsThis large study demonstrates that COVID-19 severity may accurately be predicted using a score that accounts for the dynamic evolution of vital signs. Further, race, social determinants of health, and hospital occupancy play an important role.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    