skip to main content

Title: Object Classification From Randomized EEG Trials}
New results suggest strong limits to the feasibility of object classification from human brain activity evoked by image stimuli, as measured through EEG. Considerable prior work suffers from a confound between the stimulus class and the time since the start of the experiment. A prior attempt to avoid this confound using randomized trials was unable to achieve results above chance in a statistically significant fashion when the data sets were of the same size as the original experiments. Here, we attempt object classification from EEG using an array of methods that are representative of the state-of-the-art, with a far larger (20x) dataset of randomized EEG trials, 1,000 stimulus presentations of each of forty classes, all from a single subject. To our knowledge, this is the largest such EEG data-collection effort from a single subject and is at the bounds of feasibility. We obtain classification accuracy that is marginally above chance and above chance in a statistically significant fashion, and further assess how accuracy depends on the classifier used, the amount of training data used, and the number of classes. Reaching the limits of data collection with only marginally above-chance performance suggests that the prevailing literature substantially exaggerates the feasibility of object classification from EEG.  more » « less
Award ID(s):
1734938 1522954
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Lee, Kyoung Mu (Ed.)
    A recent paper claims that a newly proposed method classifies EEG data recorded from subjects viewing ImageNet stimuli better than two prior methods. However, the analysis used to support that claim is based on confounded data. We repeat the analysis on a large new dataset that is free from that confound. Training and testing on aggregated supertrials derived by summing trials demonstrates that the two prior methods achieve statistically significant above-chance accuracy while the newly proposed method does not. 
    more » « less
  2. Objectively differentiating patient mental states based on electrical activity, as opposed to overt behavior, is a fundamental neuroscience problem with medical applications, such as identifying patients in locked-in state vs. coma. Electroencephalography (EEG), which detects millisecond-level changes in brain activity across a range of frequencies, allows for assessment of external stimulus processing by the brain in a non-invasive manner. We applied machine learning methods to 26-channel EEG data of 24 fluent Deaf signers watching videos of sign language sentences (comprehension condition), and the same videos reversed in time (non-comprehension condition), to objectively separate vision-based high-level cognition states. While spectrotemporal parameters of the stimuli were identical in comprehension vs. non-comprehension conditions, the neural responses of participants varied based on their ability to linguistically decode visual data. We aimed to determine which subset of parameters (specific scalp regions or frequency ranges) would be necessary and sufficient for high classification accuracy of comprehension state. Optical flow, characterizing distribution of velocities of objects in an image, was calculated for each pixel of stimulus videos using MATLAB Vision toolbox. Coherence between optical flow in the stimulus and EEG neural response (per video, per participant) was then computed using canonical component analysis with NoiseTools toolbox. Peak correlations were extracted for each frequency for each electrode, participant, and video. A set of standard ML algorithms were applied to the entire dataset (26 channels, frequencies from .2 Hz to 12.4 Hz, binned in 1 Hz increments), with consistent out-of-sample 100% accuracy for frequencies in .2-1 Hz range for all regions, and above 80% accuracy for frequencies < 4 Hz. Sparse Optimal Scoring (SOS) was then applied to the EEG data to reduce the dimensionality of the features and improve model interpretability. SOS with elastic-net penalty resulted in out-of-sample classification accuracy of 98.89%. The sparsity pattern in the model indicated that frequencies between 0.2–4 Hz were primarily used in the classification, suggesting that underlying data may be group sparse. Further, SOS with group lasso penalty was applied to regional subsets of electrodes (anterior, posterior, left, right). All trials achieved greater than 97% out-of-sample classification accuracy. The sparsity patterns from the trials using 1 Hz bins over individual regions consistently indicated frequencies between 0.2–1 Hz were primarily used in the classification, with anterior and left regions performing the best with 98.89% and 99.17% classification accuracy, respectively. While the sparsity pattern may not be the unique optimal model for a given trial, the high classification accuracy indicates that these models have accurately identified common neural responses to visual linguistic stimuli. Cortical tracking of spectro-temporal change in the visual signal of sign language appears to rely on lower frequencies proportional to the N400/P600 time-domain evoked response potentials, indicating that visual language comprehension is grounded in predictive processing mechanisms. 
    more » « less
  3. null (Ed.)
    A recent paper [1] claims to classify brain processing evoked in subjects watching ImageNet stimuli as measured with EEG and to employ a representation derived from this processing to construct a novel object classifier. That paper, together with a series of subsequent papers [2] , [3] , [4] , [5] , [6] , [7] , [8] , claims to achieve successful results on a wide variety of computer-vision tasks, including object classification, transfer learning, and generation of images depicting human perception and thought using brain-derived representations measured through EEG. Our novel experiments and analyses demonstrate that their results crucially depend on the block design that they employ, where all stimuli of a given class are presented together, and fail with a rapid-event design, where stimuli of different classes are randomly intermixed. The block design leads to classification of arbitrary brain states based on block-level temporal correlations that are known to exist in all EEG data, rather than stimulus-related activity. Because every trial in their test sets comes from the same block as many trials in the corresponding training sets, their block design thus leads to classifying arbitrary temporal artifacts of the data instead of stimulus-related activity. This invalidates all subsequent analyses performed on this data in multiple published papers and calls into question all of the reported results. We further show that a novel object classifier constructed with a random codebook performs as well as or better than a novel object classifier constructed with the representation extracted from EEG data, suggesting that the performance of their classifier constructed with a representation extracted from EEG data does not benefit from the brain-derived representation. Together, our results illustrate the far-reaching implications of the temporal autocorrelations that exist in all neuroimaging data for classification experiments. Further, our results calibrate the underlying difficulty of the tasks involved and caution against overly optimistic, but incorrect, claims to the contrary. 
    more » « less
  4. null (Ed.)
    Wearable robotic devices are being designed to assist the elderly population and other patients with locomotion disabilities. However, wearable robotics increases the risk from falling. Neuroimaging studies have provided evidence for the involvement of frontocentral and parietal cortices in postural control and this opens up the possibility of using decoders for early detection of balance loss by using electroencephalography (EEG). This study investigates the presence of commonly identified components of the perturbation evoked responses (PEP) when a person is in an exoskeleton. We also evaluated the feasibility of using single-trial EEG to predict the loss of balance using a convolution neural network. Overall, the model achieved a mean 5-fold cross-validation test accuracy of 75.2 % across six subjects with 50% as the chance level. We employed a gradient class activation map-based visualization technique for interpreting the decisions of the CNN and demonstrated that the network learns from PEP components present in these single trials. The high localization ability of Grad-CAM demonstrated here, opens up the possibilities for deploying CNN for ERP/PEP analysis while emphasizing on model interpretability. 
    more » « less
  5. Obeid, Iyad Selesnick (Ed.)
    The Temple University Hospital EEG Corpus (TUEG) [1] is the largest publicly available EEG corpus of its type and currently has over 5,000 subscribers (we currently average 35 new subscribers a week). Several valuable subsets of this corpus have been developed including the Temple University Hospital EEG Seizure Corpus (TUSZ) [2] and the Temple University Hospital EEG Artifact Corpus (TUAR) [3]. TUSZ contains manually annotated seizure events and has been widely used to develop seizure detection and prediction technology [4]. TUAR contains manually annotated artifacts and has been used to improve machine learning performance on seizure detection tasks [5]. In this poster, we will discuss recent improvements made to both corpora that are creating opportunities to improve machine learning performance. Two major concerns that were raised when v1.5.2 of TUSZ was released for the Neureka 2020 Epilepsy Challenge were: (1) the subjects contained in the training, development (validation) and blind evaluation sets were not mutually exclusive, and (2) high frequency seizures were not accurately annotated in all files. Regarding (1), there were 50 subjects in dev, 50 subjects in eval, and 592 subjects in train. There was one subject common to dev and eval, five subjects common to dev and train, and 13 subjects common between eval and train. Though this does not substantially influence performance for the current generation of technology, it could be a problem down the line as technology improves. Therefore, we have rebuilt the partitions of the data so that this overlap was removed. This required augmenting the evaluation and development data sets with new subjects that had not been previously annotated so that the size of these subsets remained approximately the same. Since these annotations were done by a new group of annotators, special care was taken to make sure the new annotators followed the same practices as the previous generations of annotators. Part of our quality control process was to have the new annotators review all previous annotations. This rigorous training coupled with a strict quality control process where annotators review a significant amount of each other’s work ensured that there is high interrater agreement between the two groups (kappa statistic greater than 0.8) [6]. In the process of reviewing this data, we also decided to split long files into a series of smaller segments to facilitate processing of the data. Some subscribers found it difficult to process long files using Python code, which tends to be very memory intensive. We also found it inefficient to manipulate these long files in our annotation tool. In this release, the maximum duration of any single file is limited to 60 mins. This increased the number of edf files in the dev set from 1012 to 1832. Regarding (2), as part of discussions of several issues raised by a few subscribers, we discovered some files only had low frequency epileptiform events annotated (defined as events that ranged in frequency from 2.5 Hz to 3 Hz), while others had events annotated that contained significant frequency content above 3 Hz. Though there were not many files that had this type of activity, it was enough of a concern to necessitate reviewing the entire corpus. An example of an epileptiform seizure event with frequency content higher than 3 Hz is shown in Figure 1. Annotating these additional events slightly increased the number of seizure events. In v1.5.2, there were 673 seizures, while in v1.5.3 there are 1239 events. One of the fertile areas for technology improvements is artifact reduction. Artifacts and slowing constitute the two major error modalities in seizure detection [3]. This was a major reason we developed TUAR. It can be used to evaluate artifact detection and suppression technology as well as multimodal background models that explicitly model artifacts. An issue with TUAR was the practicality of the annotation tags used when there are multiple simultaneous events. An example of such an event is shown in Figure 2. In this section of the file, there is an overlap of eye movement, electrode artifact, and muscle artifact events. We previously annotated such events using a convention that included annotating background along with any artifact that is present. The artifacts present would either be annotated with a single tag (e.g., MUSC) or a coupled artifact tag (e.g., MUSC+ELEC). When multiple channels have background, the tags become crowded and difficult to identify. This is one reason we now support a hierarchical annotation format using XML – annotations can be arbitrarily complex and support overlaps in time. Our annotators also reviewed specific eye movement artifacts (e.g., eye flutter, eyeblinks). Eye movements are often mistaken as seizures due to their similar morphology [7][8]. We have improved our understanding of ocular events and it has allowed us to annotate artifacts in the corpus more carefully. In this poster, we will present statistics on the newest releases of these corpora and discuss the impact these improvements have had on machine learning research. We will compare TUSZ v1.5.3 and TUAR v2.0.0 with previous versions of these corpora. We will release v1.5.3 of TUSZ and v2.0.0 of TUAR in Fall 2021 prior to the symposium. ACKNOWLEDGMENTS Research reported in this publication was most recently supported by the National Science Foundation’s Industrial Innovation and Partnerships (IIP) Research Experience for Undergraduates award number 1827565. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the official views of any of these organizations. REFERENCES [1] I. Obeid and J. Picone, “The Temple University Hospital EEG Data Corpus,” in Augmentation of Brain Function: Facts, Fiction and Controversy. Volume I: Brain-Machine Interfaces, 1st ed., vol. 10, M. A. Lebedev, Ed. Lausanne, Switzerland: Frontiers Media S.A., 2016, pp. 394 398. [2] V. Shah et al., “The Temple University Hospital Seizure Detection Corpus,” Frontiers in Neuroinformatics, vol. 12, pp. 1–6, 2018. [3] A. Hamid et, al., “The Temple University Artifact Corpus: An Annotated Corpus of EEG Artifacts.” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2020, pp. 1-3. [4] Y. Roy, R. Iskander, and J. Picone, “The NeurekaTM 2020 Epilepsy Challenge,” NeuroTechX, 2020. [Online]. Available: [Accessed: 01-Dec-2021]. [5] S. Rahman, A. Hamid, D. Ochal, I. Obeid, and J. Picone, “Improving the Quality of the TUSZ Corpus,” in Proceedings of the IEEE Signal Processing in Medicine and Biology Symposium (SPMB), 2020, pp. 1–5. [6] V. Shah, E. von Weltin, T. Ahsan, I. Obeid, and J. Picone, “On the Use of Non-Experts for Generation of High-Quality Annotations of Seizure Events,” Available: https://www.isip.picone [Accessed: 01-Dec-2021]. [7] D. Ochal, S. Rahman, S. Ferrell, T. Elseify, I. Obeid, and J. Picone, “The Temple University Hospital EEG Corpus: Annotation Guidelines,” Philadelphia, Pennsylvania, USA, 2020. [8] D. Strayhorn, “The Atlas of Adult Electroencephalography,” EEG Atlas Online, 2014. [Online]. Availabl 
    more » « less