skip to main content


This content will become publicly available on December 5, 2024

Title: A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains
The human visual system uses two parallel pathways for spatial processing and object recognition. In contrast, computer vision systems tend to use a single feedforward pathway, rendering them less robust, adaptive, or efficient than human vision. To bridge this gap, we developed a dual-stream vision model inspired by the human eyes and brain. At the input level, the model samples two complementary visual patterns to mimic how the human eyes use magnocellular and parvocellular retinal ganglion cells to separate retinal inputs to the brain. At the backend, the model processes the separate input patterns through two branches of convolutional neural networks (CNN) to mimic how the human brain uses the dorsal and ventral cortical pathways for parallel visual processing. The first branch (WhereCNN) samples a global view to learn spatial attention and control eye movements. The second branch (WhatCNN) samples a local view to represent the object around the fixation. Over time, the two branches interact recurrently to build a scene representation from moving fixations. We compared this model with the human brains processing the same movie and evaluated their functional alignment by linear transformation. The WhereCNN and WhatCNN branches were found to differentially match the dorsal and ventral pathways of the visual cortex, respectively, primarily due to their different learning objectives, rather than their distinctions in retinal sampling or sensitivity to attention-driven eye movements. These model-based results lead us to speculate that the distinct responses and representations of the ventral and dorsal streams are more influenced by their distinct goals in visual attention and object recognition than by their specific bias or selectivity in retinal inputs. This dual-stream model takes a further step in brain-inspired computer vision, enabling parallel neural networks to actively explore and understand the visual surroundings.  more » « less
Award ID(s):
2112773
NSF-PAR ID:
10525354
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S
Publisher / Repository:
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Date Published:
Format(s):
Medium: X
Location:
New Orlean
Sponsoring Org:
National Science Foundation
More Like this
  1. Some animals including humans use stereoscopic vision which reconstructs spatial information about the environment from the disparity between images captured by eyes in two separate adjacent locations. Like other sensory information, such stereoscopic information is expected to influence attentional selection. We develop a biologically plausible model of binocular vision to study its effect on bottom-up visual attention, i.e., visual saliency. In our model, the scene is organized in terms of proto-objects on which attention acts, rather than on unbound sets of elementary features. We show that taking into account the stereoscopic information improves the performance of the model in the prediction of human eye movements with statistically significant differences. 
    more » « less
  2. Abstract

    High visual acuity is essential for many tasks, from recognizing distant friends to driving a car. While much is known about how the eye’s optics and anatomy contribute to spatial resolution, possible influences from eye movements are rarely considered. Yet humans incessantly move their eyes, and it has long been suggested that oculomotor activity enhances fine pattern vision. Here we examine the role of eye movements in the most common assessment of visual acuity, the Snellen eye chart. By precisely localizing gaze and actively controlling retinal stimulation, we show that fixational behavior improves acuity by more than 0.15 logMAR, at least 2 lines of the Snellen chart. This improvement is achieved by adapting both microsaccades and ocular drifts to precisely position the image on the retina and adjust its motion. These findings show that humans finely tune their fixational eye movements so that they greatly contribute to normal visual acuity.

     
    more » « less
  3. Abstract

    Despite their anatomical and functional distinctions, there is growing evidence that the dorsal and ventral visual pathways interact to support object recognition. However, the exact nature of these interactions remains poorly understood. Is the presence of identity-relevant object information in the dorsal pathway simply a byproduct of ventral input? Or, might the dorsal pathway be a source of input to the ventral pathway for object recognition? In the current study, we used high-density EEG—a technique with high temporal precision and spatial resolution sufficient to distinguish parietal and temporal lobes—to characterise the dynamics of dorsal and ventral pathways during object viewing. Using multivariate analyses, we found that category decoding in the dorsal pathway preceded that in the ventral pathway. Importantly, the dorsal pathway predicted the multivariate responses of the ventral pathway in a time-dependent manner, rather than the other way around. Together, these findings suggest that the dorsal pathway is a critical source of input to the ventral pathway for object recognition.

     
    more » « less
  4. Reading is a highly complex learned skill in which humans move their eyes three to four times every second in response to visual and cognitive processing. The consensus view is that the details of these rapid eye-movement decisions—which part of a word to target with a saccade—are determined solely by low-level oculomotor heuristics. But maximally efficient saccade targeting would be sensitive to ongoing word identification, sending the eyes farther into a word the farther its identification has already progressed. Here, using a covert text-shifting paradigm, we showed just such a statistical relationship between saccade targeting in reading and trial-to-trial variability in cognitive processing. This result suggests that, rather than relying purely on heuristics, the human brain has learned to optimize eye movements in reading even at the fine-grained level of character-position targeting, reflecting efficiency-based sensitivity to ongoing cognitive processing. 
    more » « less
  5. Neural processing of objects with action associations recruits dorsal visual regions more than the neural processing of objects without such associations. We hypothesized that because the dorsal and ventral visual pathways have differing proportions of magno- and parvocellular input, there should be behavioral differences in perceptual tasks between manipulable and nonmanipulable objects. This hypothesis was tested in college-age adults across five experiments ( Ns = 26, 26, 30, 25, and 25) using a gap-detection task, suited to the spatial resolution of parvocellular processing, and an object-flicker-discrimination task, suited to the temporal resolution of magnocellular processing. Directly predicted from the cellular composition of each pathway, a strong nonmanipulable-object advantage was observed in gap detection, and a small manipulable-object advantage was observed in flicker discrimination. Additionally, these effects were modulated by reducing object recognition through inversion and by suppressing magnocellular processing using red light. These results establish perceptual differences between objects dependent on semantic knowledge.

     
    more » « less