skip to main content


Title: Integrating Flexible Normalization into Midlevel Representations of Deep Convolutional Neural Networks
Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.  more » « less
Award ID(s):
1715475
NSF-PAR ID:
10183369
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Neural Computation
Volume:
31
Issue:
11
ISSN:
0899-7667
Page Range / eLocation ID:
2138 to 2176
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    When searching for an object in a cluttered scene, we can use our memory of the target object features to guide our search, and the responses of neurons in multiple cortical visual areas are enhanced when their receptive field contains a stimulus sharing target object features. Here we tested the role of the ventral prearcuate region (VPA) of prefrontal cortex in the control of feature attention in cortical visual area V4. VPA was unilaterally inactivated in monkeys performing a free-viewing visual search for a target stimulus in an array of stimuli, impairing monkeys’ ability to find the target in the array in the affected hemifield, but leaving intact their ability to make saccades to targets presented alone. Simultaneous recordings in V4 revealed that the effects of feature attention on V4 responses were eliminated or greatly reduced while leaving the effects of spatial attention on responses intact. Altogether, the results suggest that feedback from VPA modulates processing in visual cortex during attention to object features.

     
    more » « less
  2. Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70–150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas. 
    more » « less
  3. Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors. 
    more » « less
  4. Theunissen, Frédéric E. (Ed.)
    System identification techniques—projection pursuit regression models (PPRs) and convolutional neural networks (CNNs)—provide state-of-the-art performance in predicting visual cortical neurons’ responses to arbitrary input stimuli. However, the constituent kernels recovered by these methods are often noisy and lack coherent structure, making it difficult to understand the underlying component features of a neuron’s receptive field. In this paper, we show that using a dictionary of diverse kernels with complex shapes learned from natural scenes based on efficient coding theory, as the front-end for PPRs and CNNs can improve their performance in neuronal response prediction as well as algorithmic data efficiency and convergence speed. Extensive experimental results also indicate that these sparse-code kernels provide important information on the component features of a neuron’s receptive field. In addition, we find that models with the complex-shaped sparse code front-end are significantly better than models with a standard orientation-selective Gabor filter front-end for modeling V1 neurons that have been found to exhibit complex pattern selectivity. We show that the relative performance difference due to these two front-ends can be used to produce a sensitive metric for detecting complex selectivity in V1 neurons. 
    more » « less
  5. How do life experiences impact cortical function? In people who are born blind, the “visual” cortices are recruited for nonvisual tasks such as Braille reading and sound localization (e.g., Collignon et al., 2011; Sadato et al., 1996). The mechanisms of this recruitment are not known. Do visual cortices have a latent capacity to respond to nonvisual information that is equal throughout the lifespan? Alternatively, is there a sensitive period of heightened plasticity that makes visual cortex repurposing possible during childhood? To gain insight into these questions, we leveraged naturalistic auditory stimuli to quantify and compare cross-modal responses congenitally blind (CB, n=22), adult-onset blind (vision loss >18 years-of-age, AB, n=14) and sighted (n=22) individuals. Participants listened to auditory excerpts from movies; a spoken narrative; and matched meaningless auditory stimuli (i.e., shuffled sentences, backwards speech) during fMRI scanning. These rich naturalistic stimuli made it possible to simultaneous engage a broad range of cognitive domains. We correlated the voxel-wise timecourses of different participants within each group. For all groups, all stimulus conditions induced synchrony in auditory cortex and for all groups only the narrative stimuli synchronized responses in higher-cognitive fronto-parietal and temporal regions. Inter-subject synchrony in visual cortices was high in the CB group for the movie and narrative stimuli but not for meaningless auditory controls. In contrast, visual cortex synchrony was equally low among AB and sighted blindfolded participants. Even many years of blindness in adulthood fail to enable responses to naturalistic auditory information in visual cortices of people who had sight as children. These findings suggest that cross-modal responses in visual cortex of people born blind reflect the plasticity of developing visual cortex during a sensitive period. 
    more » « less