skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Shared spatiotemporal category representations in biological and artificial deep neural networks
Visual scene category representations emerge very rapidly, yet the computational transformations that enable such invariant categorizations remain elusive. Deep convolutional neural networks (CNNs) perform visual categorization at near human-level accuracy using a feedforward architecture, providing neuroscientists with the opportunity to assess one successful series of representational transformations that enable categorization in silico. The goal of the current study is to assess the extent to which sequential scene category representations built by a CNN map onto those built in the human brain as assessed by high-density, time-resolved event-related potentials (ERPs). We found correspondence both over time and across the scalp: earlier (0–200 ms) ERP activity was best explained by early CNN layers at all electrodes. Although later activity at most electrode sites corresponded to earlier CNN layers, activity in right occipito-temporal electrodes was best explained by the later, fully-connected layers of the CNN around 225 ms post-stimulus, along with similar patterns in frontal electrodes. Taken together, these results suggest that the emergence of scene category representations develop through a dynamic interplay between early activity over occipital electrodes as well as later activity over temporal and frontal electrodes.  more » « less
Award ID(s):
1736274
PAR ID:
10066327
Author(s) / Creator(s):
;
Date Published:
Journal Name:
PLOS computational biology
Volume:
14
Issue:
7
ISSN:
1553-7358
Page Range / eLocation ID:
e1006327
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human scene categorization is characterized by its remarkable speed. While many visual and conceptual features have been linked to this ability, significant correlations exist between feature spaces, impeding our ability to determine their relative contributions to scene categorization. Here, we used a whitening transformation to decorrelate a variety of visual and conceptual features and assess the time course of their unique contributions to scene categorization. Participants (both sexes) viewed 2250 full-color scene images drawn from 30 different scene categories while having their brain activity measured through 256-channel EEG. We examined the variance explained at each electrode and time point of visual event-related potential (vERP) data from nine different whitened encoding models. These ranged from low-level features obtained from filter outputs to high-level conceptual features requiring human annotation. The amount of category information in the vERPs was assessed through multivariate decoding methods. Behavioral similarity measures were obtained in separate crowdsourced experiments. We found that all nine models together contributed 78% of the variance of human scene similarity assessments and were within the noise ceiling of the vERP data. Low-level models explained earlier vERP variability (88 ms after image onset), whereas high-level models explained later variance (169 ms). Critically, only high-level models shared vERP variability with behavior. Together, these results suggest that scene categorization is primarily a high-level process, but reliant on previously extracted low-level features. 
    more » « less
  2. The same object can be described at multiple levels of abstraction (“parka”, “coat”, “clothing”), yet human observers consistently name objects at a mid-level of specificity known as the basic level. Little is known about the temporal dynamics involved in retrieving neural representations that prioritize the basic level, nor how these dynamics change with evolving task demands. In this study, observers viewed 1080 objects arranged in a three-tier category taxonomy while 64-channel EEG was recorded. Observers performed a categorical one-back task in different recording sessions on the basic or subordinate levels. We used time-resolved multiple regression to assess the utility of superordinate-, basic-, and subordinate-level categories across the scalp. We found robust use of basic-level category information starting at about 50 ms after stimulus onset and moving from posterior electrodes (149 ms) through lateral (261 ms) to anterior sites (332 ms). Task differences were not evident in the first 200 ms of processing but were observed between 200–300 ms after stimulus presentation. Together, this work demonstrates that the object category representations prioritize the basic level and do so relatively early, congruent with results that show that basic-level categorization is an automatic and obligatory process. 
    more » « less
  3. Human scene categorization is rapid and robust, but we have little understanding of how individual features contribute to categorization, nor the time scale of their contribution. This issue is compounded by the non- independence of the many candidate features. Here, we used singular value decomposition to orthogonalize 11 different scene descriptors that included both visual and semantic features. Using high-density EEG and regression analyses, we observed that most explained variability was carried by a late layer of a deep convolutional neural network, as well as a model of a scene’s functions given by the American Time Use Survey. Furthermore, features that explained more variance also tended to explain earlier variance. These results extend previous large-scale behavioral results showing the importance of functional features for scene categorization. Furthermore, these results fail to support models of visual perception that are encapsulated from higher-level cognitive attributes. 
    more » « less
  4. Behaviorally relevant, higher order representations of an animal’s environment are built from the convergence of visual features encoded in the early stages of visual processing. Although developmental mechanisms that generate feature encoding channels in early visual circuits have been uncovered, relatively little is known about the mechanisms that direct feature convergence to enable appropriate integration into downstream circuits. Here we explore the development of a collision detection sensorimotor circuit in Drosophila melanogaster, the convergence of visual projection neurons (VPNs) onto the dendrites of a large descending neuron, the giant fiber (GF). We find VPNs encoding different visual features establish their respective territories on GF dendrites through sequential axon arrival during development. Physical occupancy, but not developmental activity, is important to maintain territories. Ablation of one VPN results in the expansion of remaining VPN territories and functional compensation that enables the GF to retain responses to ethologically relevant visual stimuli. GF developmental activity, observed using a pupal electrophysiology preparation, appears after VPN territories are established, and likely contributes to later stages of synapse assembly and refinement. Our data highlight temporal mechanisms for visual feature convergence and promote the GF circuit and the Drosophila optic glomeruli, where VPN to GF connectivity resides, as a powerful developmental model for investigating complex wiring programs and developmental plasticity. 
    more » « less
  5. Wei, Xue-Xin (Ed.)
    Recent neuroimaging studies have shown that the visual cortex plays an important role in representing the affective significance of visual input. The origin of these affect-specific visual representations is debated: they are intrinsic to the visual system versus they arise through reentry from frontal emotion processing structures such as the amygdala. We examined this problem by combining convolutional neural network (CNN) models of the human ventral visual cortex pre-trained on ImageNet with two datasets of affective images. Our results show that in all layers of the CNN models, there were artificial neurons that responded consistently and selectively to neutral, pleasant, or unpleasant images and lesioning these neurons by setting their output to zero or enhancing these neurons by increasing their gain led to decreased or increased emotion recognition performance respectively. These results support the idea that the visual system may have the intrinsic ability to represent the affective significance of visual input and suggest that CNNs offer a fruitful platform for testing neuroscientific theories. 
    more » « less