skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Dual-Stream Neural Network Explains the Functional Segregation of Dorsal and Ventral Visual Pathways in Human Brains
The human visual system uses two parallel pathways for spatial processing and object recognition. In contrast, computer vision systems tend to use a single feedforward pathway, rendering them less robust, adaptive, or efficient than human vision. To bridge this gap, we developed a dual-stream vision model inspired by the human eyes and brain. At the input level, the model samples two complementary visual patterns to mimic how the human eyes use magnocellular and parvocellular retinal ganglion cells to separate retinal inputs to the brain. At the backend, the model processes the separate input patterns through two branches of convolutional neural networks (CNN) to mimic how the human brain uses the dorsal and ventral cortical pathways for parallel visual processing. The first branch (WhereCNN) samples a global view to learn spatial attention and control eye movements. The second branch (WhatCNN) samples a local view to represent the object around the fixation. Over time, the two branches interact recurrently to build a scene representation from moving fixations. We compared this model with the human brains processing the same movie and evaluated their functional alignment by linear transformation. The WhereCNN and WhatCNN branches were found to differentially match the dorsal and ventral pathways of the visual cortex, respectively, primarily due to their different learning objectives, rather than their distinctions in retinal sampling or sensitivity to attention-driven eye movements. These model-based results lead us to speculate that the distinct responses and representations of the ventral and dorsal streams are more influenced by their distinct goals in visual attention and object recognition than by their specific bias or selectivity in retinal inputs. This dual-stream model takes a further step in brain-inspired computer vision, enabling parallel neural networks to actively explore and understand the visual surroundings.  more » « less
Award ID(s):
2112773
PAR ID:
10525354
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S
Publisher / Repository:
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Date Published:
Format(s):
Medium: X
Location:
New Orlean
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference. 
    more » « less
  2. Some animals including humans use stereoscopic vision which reconstructs spatial information about the environment from the disparity between images captured by eyes in two separate adjacent locations. Like other sensory information, such stereoscopic information is expected to influence attentional selection. We develop a biologically plausible model of binocular vision to study its effect on bottom-up visual attention, i.e., visual saliency. In our model, the scene is organized in terms of proto-objects on which attention acts, rather than on unbound sets of elementary features. We show that taking into account the stereoscopic information improves the performance of the model in the prediction of human eye movements with statistically significant differences. 
    more » « less
  3. Abstract Despite their anatomical and functional distinctions, there is growing evidence that the dorsal and ventral visual pathways interact to support object recognition. However, the exact nature of these interactions remains poorly understood. Is the presence of identity-relevant object information in the dorsal pathway simply a byproduct of ventral input? Or, might the dorsal pathway be a source of input to the ventral pathway for object recognition? In the current study, we used high-density EEG—a technique with high temporal precision and spatial resolution sufficient to distinguish parietal and temporal lobes—to characterise the dynamics of dorsal and ventral pathways during object viewing. Using multivariate analyses, we found that category decoding in the dorsal pathway preceded that in the ventral pathway. Importantly, the dorsal pathway predicted the multivariate responses of the ventral pathway in a time-dependent manner, rather than the other way around. Together, these findings suggest that the dorsal pathway is a critical source of input to the ventral pathway for object recognition. 
    more » « less
  4. The principal eyes of jumping spiders (Salticidae) integrate a dual-lens system, a tiered retinal matrix with multiple photoreceptor classes and muscular control of retinal movements to form high resolution images, extract color information, and dynamically evaluate visual scenes. While much work has been done to characterize these more complex principal anterior eyes, little work has investigated the three other pairs of simpler secondary eyes: the anterior lateral eye pair and two posterior (lateral and median) pairs of eyes. We investigated the opsin protein component of visual pigments in the eyes of three species of salticid using transcriptomics and immunohistochemistry. Based on characterization and localization of a set of three conserved opsins (Rh1 - green sensitive, Rh2 - blue sensitive, and Rh3 - ultraviolet sensitive) we have identified potential photoreceptors for blue light detection in the eyes of two out of three species: Menemerus bivittatus (Chrysillini) and Habrocestum africanum (Hasarinii). Additionally, the photoreceptor diversity of the secondary eyes exhibits more variation than previous estimates, particularly for the small, posterior median eyes previously considered vestigial in some species. In all three species investigated the lateral eyes were dominated by green-sensitive visual pigments (RH1 opsins), while the posterior median retinas were dominated by opsins forming short-wavelength sensitive visual pigments (e.g. RH2 and/or RH3/RH4). There was also variation among secondary eye types and among species in the distribution of opsins in retinal photoreceptors, particularly for the putatively blue-sensitive visual pigment formed from RH2. Our findings suggest secondary eyes have the potential for color vision, with observed differences between species likely associated with different ecologies and visual tasks. 
    more » « less
  5. Reading is a highly complex learned skill in which humans move their eyes three to four times every second in response to visual and cognitive processing. The consensus view is that the details of these rapid eye-movement decisions—which part of a word to target with a saccade—are determined solely by low-level oculomotor heuristics. But maximally efficient saccade targeting would be sensitive to ongoing word identification, sending the eyes farther into a word the farther its identification has already progressed. Here, using a covert text-shifting paradigm, we showed just such a statistical relationship between saccade targeting in reading and trial-to-trial variability in cognitive processing. This result suggests that, rather than relying purely on heuristics, the human brain has learned to optimize eye movements in reading even at the fine-grained level of character-position targeting, reflecting efficiency-based sensitivity to ongoing cognitive processing. 
    more » « less