Objectives: We set out to develop a machine learning model capable of distinguishing patients presenting with ischemic stroke from a healthy cohort of subjects. The model relies on a 3-min resting electroencephalogram (EEG) recording from which features can be computed. Materials and methods: Using a large-scale, retrospective database of EEG recordings and matching clinical reports, we were able to construct a dataset of 1385 healthy subjects and 374 stroke patients. With subjects often producing more than one recording per session, the final dataset consisted of 2401 EEG recordings (63% healthy, 37% stroke). Results: Using a rich set of features encompassing both the spectral and temporal domains, our model yielded an AUC of 0.95, with a sensitivity and specificity of 93% and 86%, respectively. Allowing for multiple recordings per subject in the training set boosted sensitivity by 7%, attributable to a more balanced dataset. Conclusions: Our work demonstrates strong potential for the use of EEG in conjunction with machine learning methods to distinguish stroke patients from healthy subjects. Our approach provides a solution that is not only timely (3-minutes recording time) but also highly precise and accurate (AUC: 0.95). Keywords: Electroencephalogram (EEG); Feature engineering; Ischemic stroke; Large vessel occlusion; Machine learning; Prehospital stroke scale.
more »
« less
SCORE-IT: A Machine Learning Framework for Automatic Standardization of EEG Reports
Machine learning (ML)-based analysis of electroencephalograms (EEGs) is playing an important role in advancing neurological care. However, the difficulties in automatically extracting useful metadata from clinical records hinder the development of large-scale EEG-based ML models. EEG reports, which are the primary sources of metadata for EEG studies, suffer from lack of standardization. Here we propose a machine learning-based system that automatically extracts attributes detailed in the SCORE specification from unstructured, natural-language EEG reports. Specifically, our system, which jointly utilizes deep learning- and rule-based methods, identifies (1) the type of seizure observed in the recording, per physician impression; (2) whether the patient was diagnosed with epilepsy or not; (3) whether the EEG recording was normal or abnormal according to physician impression. We performed an evaluation of our system using the publicly available Temple University EEG corpus and report F1 scores of 0.93, 0.82, and 0.97 for the respective tasks.
more »
« less
- Award ID(s):
- 2105233
- PAR ID:
- 10348753
- Date Published:
- Journal Name:
- IEEE Signal Processing in Medicine and Biology Symposium (SPMB)
- Page Range / eLocation ID:
- 1 to 4
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Conference Title: 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) Conference Start Date: 2021, Sept. 27 Conference End Date: 2021, Sept. 30 Conference Location: Champaign, IL, USAMetadata are key descriptors of research data, particularly for researchers seeking to apply machine learning (ML) to the vast collections of digitized specimens. Unfortunately, the available metadata is often sparse and, at times, erroneous. Additionally, it is prohibitively expensive to address these limitations through traditional, manual means. This paper reports on research that applies machine-driven approaches to analyzing digitized fish images and extracting various important features from them. The digitized fish specimens are being analyzed as part of the Biology Guided Neural Networks (BGNN) initiative, which is developing a novel class of artificial neural networks using phylogenies and anatomy ontologies. Automatically generated metadata is crucial for identifying the high-quality images needed for the neural network's predictive analytics. Methods that combine ML and image informatics techniques allow us to rapidly enrich the existing metadata associated with the 7,244 images from the Illinois Natural History Survey (INHS) used in our study. Results show we can accurately generate many key metadata properties relevant to the BGNN project, as well as general image quality metrics (e.g. brightness and contrast). Results also show that we can accurately generate bounding boxes and segmentation masks for fish, which are needed for subsequent machine learning analyses. The automatic process outperforms humans in terms of time and accuracy, and provides a novel solution for leveraging digitized specimens in ML. This research demonstrates the ability of computational methods to enhance the digital library services associated with the tens of thousands of digitized specimens stored in open-access repositories worldwide.more » « less
-
This paper presents a domain-guided approach for learning representations of scalp-electroencephalograms (EEGs) without relying on expert annotations. Expert labeling of EEGs has proven to be an unscalable process with low inter-reviewer agreement because of the complex and lengthy nature of EEG recordings. Hence, there is a need for machine learning (ML) approaches that can leverage expert domain knowledge without incurring the cost of labor-intensive annotations. Self-supervised learning (SSL) has shown promise in such settings, although existing SSL efforts on EEG data do not fully exploit EEG domain knowledge. Furthermore, it is unclear to what extent SSL models generalize to unseen tasks and datasets. Here we explore whether SSL tasks derived in a domain-guided fashion can learn generalizable EEG representations. Our contributions are three-fold: 1) we propose novel SSL tasks for EEG based on the spatial similarity of brain activity, underlying behavioral states, and age-related differences; 2) we present evidence that an encoder pretrained using the proposed SSL tasks shows strong predictive performance on multiple downstream classifications; and 3) using two large EEG datasets, we show that our encoder generalizes well to multiple EEG datasets during downstream evaluations.more » « less
-
Electroencephalography (EEG) based systems utilize machine learning (ML) and deep learning (DL) models in various applications such as seizure detection, emotion recognition, cognitive workload estimation, and brain-computer interface (BCI). However, the security and robustness of such intelligent systems under analog-domain threats have received limited attention. This paper presents the first demonstration of physical signal injection attacks on ML and DL models utilizing EEG data. We investigate how an adversary can degrade the performance of different models by non-invasively injecting signals into EEG recordings. We show that the attacks can mislead or manipulate the models and diminish the reliability of EEG-based systems. Overall, this research sheds light on the need for more trustworthy physiological-signal-based intelligent systems in the healthcare field and opens up avenues for future work.more » « less
-
null (Ed.)The goal of this report is to describe to users of the TUH EEG Corpus four important concepts that must be understood to correctly retrieve EEG signals from a data file (e.g., an EDF file). The four key concepts described in this document are: (1) physical placement: the location of the electrodes on the scalp, (2) unipolar montage: the differential recording process used to reduce noise, (3) channel labels: the system used to describe the channels, or digital signals, represented in a computer file and (4) bipolar montages: the differential mapping used to accentuate clinically-relevant events in the signal. This report is not intended to be a primer on the electrophysiology of an EEG, which is a subject unto itself, or a tutorial on how neurologists interpret EEGs. This report simply explains how the signal data in an EEG file must be accessed to accurately support clinical applications (e.g., manual interpretation or annotation of an EEG) and research applications (e.g., automatic interpretation using machine learning).more » « less