skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Diverse task-driven modeling of macaque V4 reveals functional specialization towards semantic tasks
Responses to natural stimuli in area V4—a mid-level area of the visual ventral stream—are well predicted by features from convolutional neural networks (CNNs) trained on image classification. This result has been taken as evidence for the functional role of V4 in object classification. However, we currently do not know if and to what extent V4 plays a role in solving other computational objectives. Here, we investigated normative accounts of V4 (and V1 for comparison) by predicting macaque single-neuron responses to natural images from the representations extracted by 23 CNNs trained on different computer vision tasks including semantic, geometric, 2D, and 3D types of tasks. We found that V4 was best predicted by semantic classification features and exhibited high task selectivity, while the choice of task was less consequential to V1 performance. Consistent with traditional characterizations of V4 function that show its high-dimensional tuning to various 2D and 3D stimulus directions, we found that diverse non-semantic tasks explained aspects of V4 function that are not captured by individual semantic tasks. Nevertheless, jointly considering the features of a pair of semantic classification tasks was sufficient to yield one of our top V4 models, solidifying V4’s main functional role in semantic processing and suggesting that V4’s selectivity to 2D or 3D stimulus properties found by electrophysiologists can result from semantic functional goals.  more » « less
Award ID(s):
2113173 2510328
PAR ID:
10648620
Author(s) / Creator(s):
; ; ; ; ; ; ;
Editor(s):
Einhäuser, Wolfgang
Publisher / Repository:
PLOS Computational Biology
Date Published:
Journal Name:
PLOS Computational Biology
Volume:
20
Issue:
5
ISSN:
1553-7358
Page Range / eLocation ID:
e1012056
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Kay, Kendrick (Ed.)
    A central goal of neuroscience is to understand how function-relevant brain activations are generated. Here we test the hypothesis that function-relevant brain activations are generated primarily by distributed network flows. We focused on visual processing in human cortex, given the long-standing literature supporting the functional relevance of brain activations in visual cortex regions exhibiting visual category selectivity. We began by using fMRI data from N = 352 human participants to identify category-specific responses in visual cortex for images of faces, places, body parts, and tools. We then systematically tested the hypothesis that distributed network flows can generate these localized visual category selective responses. This was accomplished using a recently developed approach for simulating – in a highly empirically constrained manner – the generation of task-evoked brain activations by modeling activity flowing over intrinsic brain connections. We next tested refinements to our hypothesis, focusing on how stimulus-driven network interactions initialized in V1 generate downstream visual category selectivity. We found evidence that network flows directly from V1 were sufficient for generating visual category selectivity, but that additional, globally distributed (whole-cortex) network flows increased category selectivity further. Using null network architectures we also found that each region’s unique intrinsic “connectivity fingerprint” was key to the generation of category selectivity. These results generalized across regions associated with all four visual categories tested (bodies, faces, places, and tools), and provide evidence that the human brain’s intrinsic network organization plays a prominent role in the generation of functionally relevant, localized responses. 
    more » « less
  2. Theunissen, Frédéric E. (Ed.)
    System identification techniques—projection pursuit regression models (PPRs) and convolutional neural networks (CNNs)—provide state-of-the-art performance in predicting visual cortical neurons’ responses to arbitrary input stimuli. However, the constituent kernels recovered by these methods are often noisy and lack coherent structure, making it difficult to understand the underlying component features of a neuron’s receptive field. In this paper, we show that using a dictionary of diverse kernels with complex shapes learned from natural scenes based on efficient coding theory, as the front-end for PPRs and CNNs can improve their performance in neuronal response prediction as well as algorithmic data efficiency and convergence speed. Extensive experimental results also indicate that these sparse-code kernels provide important information on the component features of a neuron’s receptive field. In addition, we find that models with the complex-shaped sparse code front-end are significantly better than models with a standard orientation-selective Gabor filter front-end for modeling V1 neurons that have been found to exhibit complex pattern selectivity. We show that the relative performance difference due to these two front-ends can be used to produce a sensitive metric for detecting complex selectivity in V1 neurons. 
    more » « less
  3. Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors. 
    more » « less
  4. Abstract In the primate visual system, visual object recognition involves a series of cortical areas arranged hierarchically along the ventral visual pathway. As information flows through this hierarchy, neurons become progressively tuned to more complex image features. The circuit mechanisms and computations underlying the increasing complexity of these receptive fields (RFs) remain unidentified. To understand how this complexity emerges in the secondary visual area (V2), we investigated the functional organization of inputs from the primary visual cortex (V1) to V2 by combining retrograde anatomical tracing of these inputs with functional imaging of feature maps in macaque monkey V1 and V2. We found that V1 neurons sending inputs to single V2 orientation columns have a broad range of preferred orientations, but are strongly biased towards the orientation represented at the injected V2 site. For each V2 site, we then constructed a feedforward model based on the linear combination of its anatomically- identified large-scale V1 inputs, and studied the response proprieties of the generated V2 RFs. We found that V2 RFs derived from the linear feedforward model were either elongated versions of V1 filters or had spatially complex structures. These modeled RFs predicted V2 neuron responses to oriented grating stimuli with high accuracy. Remarkably, this simple model also explained the greater selectivity to naturalistic textures of V2 cells compared to their V1 input cells. Our results demonstrate that simple linear combinations of feedforward inputs can account for the orientation selectivity and texture sensitivity of V2 RFs. 
    more » « less
  5. We have created encoding manifolds to reveal the overall responses of a brain area to a variety of stimuli. Encoding manifolds organize response properties globally: each point on an encoding manifold is a neuron, and nearby neurons respond similarly to the stimulus ensemble in time. We previously found, using a large stimulus ensemble including optic flows, that encoding manifolds for the retina were highly clustered, with each cluster corresponding to a different ganglion cell type. In contrast, the topology of the V1 manifold was continuous. Now, using responses of individual neurons from the Allen Institute Visual Coding-Neuropixels dataset in the mouse, we infer encoding manifolds for V1 and for five higher cortical visual areas (VISam, VISal, VISpm, VISlm, and VISrl). We show here that the encoding manifold topology computed only from responses to various grating stimuli is also continuous, not only for V1 but also for the higher visual areas, with smooth coordinates spanning it that include, among others, orientation selectivity and firing-rate magnitude. Surprisingly, the encoding manifold for gratings also provides information about natural scene responses. To investigate whether neurons respond more strongly to gratings or natural scenes, we plot the log ratio of natural scene responses to grating responses (mean firing rates) on the encoding manifold. This reveals a global coordinate axis organizing neurons' preferences between these two stimuli. This coordinate is orthogonal (i.e., uncorrelated) to that organizing firing rate magnitudes in VISp. Analyzing layer responses, a preference for gratings is concentrated in layer 6, whereas preference for natural scenes tends to be higher in layers 2/3 and 4. We also find that preference for natural scenes dominates the responses of neurons that prefer low (0.02 cpd) and high (0.32 cpd) spatial frequencies, rather than intermediate ones (0.04 to 0.16 cpd). Conclusion: while gratings seem limited and natural scenes unconstrained, machine learning algorithms can reveal subtle relationships between them beyond linear techniques. 
    more » « less