The Perils and Pitfalls of Block Design for EEG Classification Experiments

Li, Ren; Johansen, Jared S.; Ahmed, Hamad; Ilyevsky, Thomas V.; Wilbur, Ronnie B.; Bharadwaj, Hari M.; Siskind, Jeffrey Mark

doi:10.1109/TPAMI.2020.2973153

A recent paper [1] claims to classify brain processing evoked in subjects watching ImageNet stimuli as measured with EEG and to employ a representation derived from this processing to construct a novel object classifier. That paper, together with a series of subsequent papers [2] , [3] , [4] , [5] , [6] , [7] , [8] , claims to achieve successful results on a wide variety of computer-vision tasks, including object classification, transfer learning, and generation of images depicting human perception and thought using brain-derived representations measured through EEG. Our novel experiments and analyses demonstrate that their results crucially depend on the block design that they employ, where all stimuli of a given class are presented together, and fail with a rapid-event design, where stimuli of different classes are randomly intermixed. The block design leads to classification of arbitrary brain states based on block-level temporal correlations that are known to exist in all EEG data, rather than stimulus-related activity. Because every trial in their test sets comes from the same block as many trials in the corresponding training sets, their block design thus leads to classifying arbitrary temporal artifacts of the data instead of stimulus-related activity. This invalidates all subsequent analyses performed on this data in multiple published papers and calls into question all of the reported results. We further show that a novel object classifier constructed with a random codebook performs as well as or better than a novel object classifier constructed with the representation extracted from EEG data, suggesting that the performance of their classifier constructed with a representation extracted from EEG data does not benefit from the brain-derived representation. Together, our results illustrate the far-reaching implications of the temporal autocorrelations that exist in all neuroimaging data for classification experiments. Further, our results calibrate the underlying difficulty of the tasks involved and caution against overly optimistic, but incorrect, claims to the contrary.

More Like this