skip to main content


Title: Subspace Clustering of Physiological Data From Acute Traumatic Brain Injury Patients: Retrospective Analysis Based on the PROTECT III Trial
Background With advances in digital health technologies and proliferation of biomedical data in recent years, applications of machine learning in health care and medicine have gained considerable attention. While inpatient settings are equipped to generate rich clinical data from patients, there is a dearth of actionable information that can be used for pursuing secondary research for specific clinical conditions. Objective This study focused on applying unsupervised machine learning techniques for traumatic brain injury (TBI), which is the leading cause of death and disability among children and adults aged less than 44 years. Specifically, we present a case study to demonstrate the feasibility and applicability of subspace clustering techniques for extracting patterns from data collected from TBI patients. Methods Data for this study were obtained from the Progesterone for Traumatic Brain Injury, Experimental Clinical Treatment–Phase III (PROTECT III) trial, which included a cohort of 882 TBI patients. We applied subspace-clustering methods (density-based, cell-based, and clustering-oriented methods) to this data set and compared the performance of the different clustering methods. Results The analyses showed the following three clusters of laboratory physiological data: (1) international normalized ratio (INR), (2) INR, chloride, and creatinine, and (3) hemoglobin and hematocrit. While all subclustering algorithms had a reasonable accuracy in classifying patients by mortality status, the density-based algorithm had a higher F1 score and coverage. Conclusions Clustering approaches serve as an important step for phenotype definition and validation in clinical domains such as TBI, where patient and injury heterogeneity are among the major reasons for failure of clinical trials. The results from this study provide a foundation to develop scalable clustering algorithms for further research and validation.  more » « less
Award ID(s):
1838745
NSF-PAR ID:
10212545
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
JMIR Biomedical Engineering
Volume:
6
Issue:
1
ISSN:
2561-3278
Page Range / eLocation ID:
e24698
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction

    Predictive models have been used to aid early diagnosis of PCOS, though existing models are based on small sample sizes and limited to fertility clinic populations. We built a predictive model using machine learning algorithms based on an outpatient population at risk for PCOS to predict risk and facilitate earlier diagnosis, particularly among those who meet diagnostic criteria but have not received a diagnosis.

    Methods

    This is a retrospective cohort study from a SafetyNet hospital’s electronic health records (EHR) from 2003-2016. The study population included 30,601 women aged 18-45 years without concurrent endocrinopathy who had any visit to Boston Medical Center for primary care, obstetrics and gynecology, endocrinology, family medicine, or general internal medicine. Four prediction outcomes were assessed for PCOS. The first outcome was PCOS ICD-9 diagnosis with additional model outcomes of algorithm-defined PCOS. The latter was based on Rotterdam criteria and merging laboratory values, radiographic imaging, and ICD data from the EHR to define irregular menstruation, hyperandrogenism, and polycystic ovarian morphology on ultrasound.

    Results

    We developed predictive models using four machine learning methods: logistic regression, supported vector machine, gradient boosted trees, and random forests. Hormone values (follicle-stimulating hormone, luteinizing hormone, estradiol, and sex hormone binding globulin) were combined to create a multilayer perceptron score using a neural network classifier. Prediction of PCOS prior to clinical diagnosis in an out-of-sample test set of patients achieved an average AUC of 85%, 81%, 80%, and 82%, respectively in Models I, II, III and IV. Significant positive predictors of PCOS diagnosis across models included hormone levels and obesity; negative predictors included gravidity and positive bHCG.

    Conclusion

    Machine learning algorithms were used to predict PCOS based on a large at-risk population. This approach may guide early detection of PCOS within EHR-interfaced populations to facilitate counseling and interventions that may reduce long-term health consequences. Our model illustrates the potential benefits of an artificial intelligence-enabled provider assistance tool that can be integrated into the EHR to reduce delays in diagnosis. However, model validation in other hospital-based populations is necessary.

     
    more » « less
  2. Traumatic brain injury (TBI) is a massive public health problem worldwide. Accurate and fast automatic brain hematoma segmentation is important for TBI diagnosis, treatment and outcome prediction. In this study, we developed a fully automated system to detect and segment hematoma regions in head Computed Tomography (CT) images of patients with acute TBI. We first over-segmented brain images into superpixels and then extracted statistical and textural features to capture characteristics of superpixels. To overcome the shortage of annotated data, an uncertainty-based active learning strategy was designed to adaptively and iteratively select the most informative unlabeled data to be annotated for training a Support Vector Machine classifier (SVM). Finally, the coarse segmentation from the SVM classifier was incorporated into an active contour model to improve the accuracy of the segmentation. From our experiments, the proposed active learning strategy can achieve a comparable result with 5 times fewer labeled data compared with regular machine learning. Our proposed automatic hematoma segmentation system achieved an average Dice coefficient of 0.60 on our dataset, where patients are from multiple health centers and at multiple levels of injury. Our results show that the proposed method can effectively overcome the challenge of limited and highly varied dataset. 
    more » « less
  3. Keim-Malpass, Jessica (Ed.)
    During the early stages of hospital admission, clinicians use limited information to make decisions as patient acuity evolves. We hypothesized that clustering analysis of vital signs measured within six hours of hospital admission would reveal distinct patient phenotypes with unique pathophysiological signatures and clinical outcomes. We created a longitudinal electronic health record dataset for 75,762 adult patient admissions to a tertiary care center in 2014–2016 lasting six hours or longer. Physiotypes were derived via unsupervised machine learning in a training cohort of 41,502 patients applying consensus k -means clustering to six vital signs measured within six hours of admission. Reproducibility and correlation with clinical biomarkers and outcomes were assessed in validation cohort of 17,415 patients and testing cohort of 16,845 patients. Training, validation, and testing cohorts had similar age (54–55 years) and sex (55% female), distributions. There were four distinct clusters. Physiotype A had physiologic signals consistent with early vasoplegia, hypothermia, and low-grade inflammation and favorable short-and long-term clinical outcomes despite early, severe illness. Physiotype B exhibited early tachycardia, tachypnea, and hypoxemia followed by the highest incidence of prolonged respiratory insufficiency, sepsis, acute kidney injury, and short- and long-term mortality. Physiotype C had minimal early physiological derangement and favorable clinical outcomes. Physiotype D had the greatest prevalence of chronic cardiovascular and kidney disease, presented with severely elevated blood pressure, and had good short-term outcomes but suffered increased 3-year mortality. Comparing sequential organ failure assessment (SOFA) scores across physiotypes demonstrated that clustering did not simply recapitulate previously established acuity assessments. In a heterogeneous cohort of hospitalized patients, unsupervised machine learning techniques applied to routine, early vital sign data identified physiotypes with unique disease categories and distinct clinical outcomes. This approach has the potential to augment understanding of pathophysiology by distilling thousands of disease states into a few physiological signatures. 
    more » « less
  4. Traumatic Brain Injury (TBI) is a common cause of death and disability. However, existing tools for TBI diagnosis are either subjective or require extensive clinical setup and expertise. The increasing affordability and reduction in the size of relatively high-performance computing systems combined with promising results from TBI related machine learning research make it possible to create compact and portable systems for early detection of TBI. This work describes a Raspberry Pi based portable, real-time data acquisition, and automated processing system that uses machine learning to efficiently identify TBI and automatically score sleep stages from a single-channel Electroencephalogram (EEG) signal. We discuss the design, implementation, and verification of the system that can digitize the EEG signal using an Analog to Digital Converter (ADC) and perform real-time signal classification to detect the presence of mild TBI (mTBI). We utilize Convolutional Neural Networks (CNN) and XGBoost based predictive models to evaluate the performance and demonstrate the versatility of the system to operate with multiple types of predictive models. We achieve a peak classification accuracy of more than 90% with a classification time of less than 1 s across 16–64 s epochs for TBI vs. control conditions. This work can enable the development of systems suitable for field use without requiring specialized medical equipment for early TBI detection applications and TBI research. Further, this work opens avenues to implement connected, real-time TBI related health and wellness monitoring systems. 
    more » « less
  5. Abstract

    Children who experience a traumatic brain injury (TBI) are at elevated risk for a range of negative cognitive and neuropsychological outcomes. Identifying which children are at greatest risk for negative outcomes can be difficult due to the heterogeneity of TBI. To address this barrier, the current study applied a novel method of characterizing brain connectivity networks, Bayesian multi‐subject vector autoregressive modelling (BVAR‐connect), which used white matter integrity as priors to evaluate effective connectivity—the time‐dependent relationship in functional magnetic resonance imaging (fMRI) activity between two brain regions—within the default mode network (DMN). In a prospective longitudinal study, children ages 8–15 years with mild to severe TBI underwent diffusion tensor imaging and resting state fMRI 7 weeks after injury; post‐concussion and anxiety symptoms were assessed 7 months after injury. The goals of this study were to (1) characterize differences in positive effective connectivity of resting‐state DMN circuitry between healthy controls and children with TBI, (2) determine if severity of TBI was associated with differences in DMN connectivity and (3) evaluate whether patterns of DMN effective connectivity predicted persistent post‐concussion symptoms and anxiety. Healthy controls had unique positive connectivity that mostly emerged from the inferior temporal lobes. In contrast, children with TBI had unique effective connectivity among orbitofrontal and parietal regions. These positive orbitofrontal‐parietal DMN effective connectivity patterns also differed by TBI severity and were associated with persisting behavioural outcomes. Effective connectivity may be a sensitive neuroimaging marker of TBI severity as well as a predictor of chronic post‐concussion symptoms and anxiety.

     
    more » « less