Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors.
more »
« less
Integrating Flexible Normalization into Midlevel Representations of Deep Convolutional Neural Networks
Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.
more »
« less
- Award ID(s):
- 1715475
- PAR ID:
- 10183369
- Date Published:
- Journal Name:
- Neural Computation
- Volume:
- 31
- Issue:
- 11
- ISSN:
- 0899-7667
- Page Range / eLocation ID:
- 2138 to 2176
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70–150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.more » « less
-
Abstract Orientation selectivity in primate visual cortex is organized into cortical columns. Since cortical columns are at a finer spatial scale than the sampling resolution of standard BOLD fMRI measurements, analysis approaches have been proposed to peer past these spatial resolution limitations. It was recently found that these methods are predominantly sensitive to stimulus vignetting - a form of selectivity arising from an interaction of the oriented stimulus with the aperture edge. Beyond vignetting, it is not clear whether orientation-selective neural responses are detectable in BOLD measurements. Here, we leverage a dataset of visual cortical responses measured using high-field 7T fMRI. Fitting these responses using image-computable models, we compensate for vignetting and nonetheless find reliable tuning for orientation. Results further reveal a coarse-scale map of orientation preference that may constitute the neural basis for known perceptual anisotropies. These findings settle a long-standing debate in human neuroscience, and provide insights into functional organization principles of visual cortex.more » « less
-
How do life experiences impact cortical function? In people who are born blind, the “visual” cortices are recruited for nonvisual tasks such as Braille reading and sound localization (e.g., Collignon et al., 2011; Sadato et al., 1996). The mechanisms of this recruitment are not known. Do visual cortices have a latent capacity to respond to nonvisual information that is equal throughout the lifespan? Alternatively, is there a sensitive period of heightened plasticity that makes visual cortex repurposing possible during childhood? To gain insight into these questions, we leveraged naturalistic auditory stimuli to quantify and compare cross-modal responses congenitally blind (CB, n=22), adult-onset blind (vision loss >18 years-of-age, AB, n=14) and sighted (n=22) individuals. Participants listened to auditory excerpts from movies; a spoken narrative; and matched meaningless auditory stimuli (i.e., shuffled sentences, backwards speech) during fMRI scanning. These rich naturalistic stimuli made it possible to simultaneous engage a broad range of cognitive domains. We correlated the voxel-wise timecourses of different participants within each group. For all groups, all stimulus conditions induced synchrony in auditory cortex and for all groups only the narrative stimuli synchronized responses in higher-cognitive fronto-parietal and temporal regions. Inter-subject synchrony in visual cortices was high in the CB group for the movie and narrative stimuli but not for meaningless auditory controls. In contrast, visual cortex synchrony was equally low among AB and sighted blindfolded participants. Even many years of blindness in adulthood fail to enable responses to naturalistic auditory information in visual cortices of people who had sight as children. These findings suggest that cross-modal responses in visual cortex of people born blind reflect the plasticity of developing visual cortex during a sensitive period.more » « less
-
Abstract Perception of visual motion is important for a range of ethological behaviors in mammals. In primates, specific visual cortical regions are specialized for processing of coherent visual motion. However, whether mouse visual cortex has a similar organization remains unclear, despite powerful genetic tools available for measuring population neural activity. Here, we use widefield and 2-photon calcium imaging of transgenic mice to measure mesoscale and cellular responses to coherent motion. Imaging of primary visual cortex (V1) and higher visual areas (HVAs) during presentation of natural movies and random dot kinematograms (RDKs) reveals varied responsiveness to coherent motion, with stronger responses in dorsal stream areas compared to ventral stream areas. Moreover, there is considerable anisotropy within visual areas, such that neurons representing the lower visual field are more responsive to coherent motion. These results indicate that processing of visual motion in mouse cortex is distributed heterogeneously both across and within visual areas.more » « less
An official website of the United States government

