- Award ID(s):
- 1736394
- NSF-PAR ID:
- 10177650
- Date Published:
- Journal Name:
- The journal of neuroscience
- Volume:
- 40
- ISSN:
- 1529-2401
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Human scene categorization is rapid and robust, but we have little understanding of how individual features contribute to categorization, nor the time scale of their contribution. This issue is compounded by the non- independence of the many candidate features. Here, we used singular value decomposition to orthogonalize 11 different scene descriptors that included both visual and semantic features. Using high-density EEG and regression analyses, we observed that most explained variability was carried by a late layer of a deep convolutional neural network, as well as a model of a scene’s functions given by the American Time Use Survey. Furthermore, features that explained more variance also tended to explain earlier variance. These results extend previous large-scale behavioral results showing the importance of functional features for scene categorization. Furthermore, these results fail to support models of visual perception that are encapsulated from higher-level cognitive attributes.more » « less
-
Visual scene category representations emerge very rapidly, yet the computational transformations that enable such invariant categorizations remain elusive. Deep convolutional neural networks (CNNs) perform visual categorization at near human-level accuracy using a feedforward architecture, providing neuroscientists with the opportunity to assess one successful series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential scene category representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs). We found correspondence both over time and across the scalp: earlier (0–200 ms) ERP activity was best explained by early CNN layers at all electrodes. Although later activity at most electrode sites corresponded to earlier CNN layers, activity in right occipito-temporal electrodes was best explained by the later, fully-connected layers of the CNN around 225 ms post-stimulus, along with similar patterns in frontal electrodes. Taken together, these results suggest that the emergence of scene category representations develop through a dynamic interplay between early activity over occipital electrodes as well as later activity over temporal and frontal electrodes.more » « less
-
Abstract Objective . Brain–computer interfaces (BCIs) show promise as a direct line of communication between the brain and the outside world that could benefit those with impaired motor function. But the commands available for BCI operation are often limited by the ability of the decoder to differentiate between the many distinct motor or cognitive tasks that can be visualized or attempted. Simple binary command signals (e.g. right hand at rest versus movement) are therefore used due to their ability to produce large observable differences in neural recordings. At the same time, frequent command switching can impose greater demands on the subject’s focus and takes time to learn. Here, we attempt to decode the degree of effort in a specific movement task to produce a graded and more flexible command signal. Approach. Fourteen healthy human subjects (nine male, five female) responded to visual cues by squeezing a hand dynamometer to different levels of predetermined force, guided by continuous visual feedback, while the electroencephalogram (EEG) and grip force were monitored. Movement-related EEG features were extracted and modeled to predict exerted force. Main results. We found that event-related desynchronization (ERD) of the 8–30 Hz mu-beta sensorimotor rhythm of the EEG is separable for different degrees of motor effort. Upon four-fold cross-validation, linear classifiers were found to predict grip force from an ERD vector with mean accuracies across subjects of 53% and 55% for the dominant and non-dominant hand, respectively. ERD amplitude increased with target force but appeared to pass through a trough that hinted at non-monotonic behavior. Significance. Our results suggest that modeling and interactive feedback based on the intended level of motor effort is feasible. The observed ERD trends suggest that different mechanisms may govern intermediate versus low and high degrees of motor effort. This may have utility in rehabilitative protocols for motor impairments.more » « less
-
Abstract Previous work has demonstrated similarities and differences between aerial and terrestrial image viewing. Aerial scene categorization, a pivotal visual processing task for gathering geoinformation, heavily depends on rotation-invariant information. Aerial image-centered research has revealed effects of low-level features on performance of various aerial image interpretation tasks. However, there are fewer studies of viewing behavior for aerial scene categorization and of higher-level factors that might influence that categorization. In this paper, experienced subjects’ eye movements were recorded while they were asked to categorize aerial scenes. A typical viewing center bias was observed. Eye movement patterns varied among categories. We explored the relationship of nine image statistics to observers’ eye movements. Results showed that if the images were less homogeneous, and/or if they contained fewer or no salient diagnostic objects, viewing behavior became more exploratory. Higher- and object-level image statistics were predictive at both the image and scene category levels. Scanpaths were generally organized and small differences in scanpath randomness could be roughly captured by critical object saliency. Participants tended to fixate on critical objects. Image statistics included in this study showed rotational invariance. The results supported our hypothesis that the availability of diagnostic objects strongly influences eye movements in this task. In addition, this study provides supporting evidence for Loschky et al.’s (Journal of Vision, 15(6), 11, 2015) speculation that aerial scenes are categorized on the basis of image parts and individual objects. The findings were discussed in relation to theories of scene perception and their implications for automation development.
-
Abstract Understanding actions performed by others requires us to integrate different types of information about people, scenes, objects, and their interactions. What organizing dimensions does the mind use to make sense of this complex action space? To address this question, we collected intuitive similarity judgments across two large-scale sets of naturalistic videos depicting everyday actions. We used cross-validated sparse non-negative matrix factorization to identify the structure underlying action similarity judgments. A low-dimensional representation, consisting of nine to ten dimensions, was sufficient to accurately reconstruct human similarity judgments. The dimensions were robust to stimulus set perturbations and reproducible in a separate odd-one-out experiment. Human labels mapped these dimensions onto semantic axes relating to food, work, and home life; social axes relating to people and emotions; and one visual axis related to scene setting. While highly interpretable, these dimensions did not share a clear one-to-one correspondence with prior hypotheses of action-relevant dimensions. Together, our results reveal a low-dimensional set of robust and interpretable dimensions that organize intuitive action similarity judgments and highlight the importance of data-driven investigations of behavioral representations.