Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)The evaluation of machine learning algorithms in biomedical fields for ap-plications involving sequential data lacks both rigor and standardization. Common quantitative scalar evaluation metrics such as sensitivity and specificity can often be misleading and not accurately integrate application requirements. Evaluation metrics must ultimately reflect the needs of users yet be sufficiently sensitive to guide algorithm development. For example, feedback from critical care clinicians who use automated event detection software in clinical applications has been overwhelmingly emphatic that a low false alarm rate, typically measured in units of the number of errors per 24 hours, is the single most important criterion for user acceptance. Though using a single metric is not often as insightful as examining performance over a range of operating conditions, there is, nevertheless, a need for a sin-gle scalar figure of merit. In this chapter, we discuss the deficiencies of existing metrics for a seizure detection task and propose several new metrics that offer a more balanced view of performance. We demonstrate these metrics on a seizure detection task based on the TUH EEG Seizure Corpus. We introduce two promising metrics: (1) a measure based on a concept borrowed from the spoken term detection literature, Actual Term-Weighted Value, and (2) a new metric, Time-Aligned Event Scoring (TAES), that accounts for the temporal align-ment of the hypothesis to the reference annotation. We demonstrate that state of the art technology based on deep learning, though impressive in its performance, still needs significant improvement before it will meet very strict user acceptance guidelines.more » « less
Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)There has been a lack of standardization of the evaluation of sequential decoding systems in the bioengineering community. Assessment of the accuracy of a candidate system’s segmentations and measurement of a false alarm rate are examples of two performance metrics that are very critical to the operational acceptance of a technology. However, measurement of such quantities in a consistent manner require many scoring software implementation details to be resolved. Results can be highly sensitive to these implementation details. In this paper, we revisit and evaluate a set of metrics introduced in our open source scoring software for sequential decoding of multichannel signals. This software was used to rank sixteen automatic seizure detection systems recently developed for the 2020 Neureka® Epilepsy Challenge. The systems produced by the participants provided us with a broad range of design variations that allowed assessment of the consistency of the proposed metrics. We present a comprehensive assessment of four of these new metrics and validate our findings with our previous studies. We also validate a proposed new metric, time-aligned event scoring, that focuses on the segmentation behavior of an algorithm. We demonstrate how we can gain insight into the performance of a system using these metrics.more » « less
Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)Scalp electroencephalograms (EEGs) are the primary means by which phy-sicians diagnose brain-related illnesses such as epilepsy and seizures. Au-tomated seizure detection using clinical EEGs is a very difficult machine learning problem due to the low fidelity of a scalp EEG signal. Neverthe-less, despite the poor signal quality, clinicians can reliably diagnose ill-nesses from visual inspection of the signal waveform. Commercially avail-able automated seizure detection systems, however, suffer from unaccepta-bly high false alarm rates. Deep learning algorithms that require large amounts of training data have not previously been effective on this task due to the lack of big data resources necessary for building such models and the complexity of the signals involved. The evolution of big data science, most notably the release of the Temple University EEG (TUEG) Corpus, has mo-tivated renewed interest in this problem. In this chapter, we discuss the application of a variety of deep learning ar-chitectures to automated seizure detection. Architectures explored include multilayer perceptrons, convolutional neural networks (CNNs), long short-term memory networks (LSTMs), gated recurrent units and residual neural networks. We use the TUEG Corpus, supplemented with data from Duke University, to evaluate the performance of these hybrid deep structures. Since TUEG contains a significant amount of unlabeled data, we also dis-cuss unsupervised pre-training methods used prior to training these com-plex recurrent networks. Exploiting spatial and temporal context is critical for accurate disambigua-tion of seizures from artifacts. We explore how effectively several conven-tional architectures are able to model context and introduce a hybrid system that integrates CNNs and LSTMs. The primary error modalities observed by this state-of-the-art system were false alarms generated during brief delta range slowing patterns such as intermittent rhythmic delta activity. A varie-ty of these types of events have been observed during inter-ictal and post-ictal stages. Training models on such events with diverse morphologies has the potential to significantly reduce the remaining false alarms. This is one reason we are continuing our efforts to annotate a larger portion of TUEG. Increasing the data set size significantly allows us to leverage more ad-vanced machine learning methodologies.more » « less
null (Ed.)Objective: To demonstrate that combining automatic processing of EEG data using high performance machine learning algorithms with manual review by expert annotators can quickly identify subjects with prolonged seizures. Background: Prolonged seizures are markers of seizure severity, risk of transformation into status epilepticus, and medical morbidity. Early recognition of prolonged seizures permits intervention and reduces morbidity. Design/Methods: We triaged the TUH EEG Corpus, an open source database of EEGs, by running a state-of-the-art hybrid LSTM-based deep learning system. Then, we postprocessed the output to identify high confidence hypotheses for seizures that were greater than three minutes in duration. Results: The triaging method selected 25 subjects for further review. 17 subjects had seizures; only 5 met criteria for seizures greater than 3 minutes. 11 subjects did not have a prior diagnosis of epilepsy. Among these, 63% had acute respiratory failure and 36% had cardiac arrest leading to seizures secondary to anoxic brain injury. 18 (72%) EEGs were obtained in long-term monitoring (LTM), 1 (4%) in the epilepsy monitoring unit (EMU), and 6 (24%) as a routine EEG (rEEG). 72.2% of seizures in LTM were identified correctly versus 66.7% in rEEGs. Of the 9 subjects who were deceased, 7 (78%) had been on LTM. The seizure detection algorithm misidentified seizures in 7 subjects (28%). A total of 22 (88%) subjects had some ictal pattern. Patterns mistaken for seizure activity included muscle artifact, generalized periodic discharges, generalized spike-and-wave, triphasic waves, and interestingly, an EEG recording captured during CPR. Conclusions: This hybrid approach, which combines state-of-the-art machine learning seizure detection software with human annotation, successfully identified prolonged seizures in 72% of subjects; 88% had ictal patterns. Prolonged seizures were more common in LTM subjects than the EMU and were associated with acute cardiac or pulmonary insult.more » « less