skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The influence of stereopsis on visual saliency in a proto-object based model of selective attention
Some animals including humans use stereoscopic vision which reconstructs spatial information about the environment from the disparity between images captured by eyes in two separate adjacent locations. Like other sensory information, such stereoscopic information is expected to influence attentional selection. We develop a biologically plausible model of binocular vision to study its effect on bottom-up visual attention, i.e., visual saliency. In our model, the scene is organized in terms of proto-objects on which attention acts, rather than on unbound sets of elementary features. We show that taking into account the stereoscopic information improves the performance of the model in the prediction of human eye movements with statistically significant differences.  more » « less
Award ID(s):
2223725
PAR ID:
10473183
Author(s) / Creator(s):
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Vision research
Volume:
212
ISSN:
0042-6989
Page Range / eLocation ID:
108304
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Three-dimensional (3D) vision in augmented reality (AR) displays can enable highly immersive and realistic viewer experience, hence, attracts much attention. Most current approaches create 3D vision by projecting stereoscopic images to different eyes using two separate projection systems, which are inevitably bulky for wearable devices. Here, we propose a compact stereo waveguide AR display system using a single piece of thin flat glass integrated with a polarization-multiplexed metagrating in-coupler and two diffractive grating out-couplers. Incident light of opposite circular polarization states carrying stereoscopic images are first steered by the metagrating in-coupler to opposite propagation directions in the flat glass waveguide, subsequently extracted by the diffractive grating out-couplers, and finally received by different eyes, forming 3D stereo vision. Experimentally, we fabricated a display prototype and demonstrated independent projection of two polarization-multiplexed stereoscopic images. 
    more » « less
  2. The goal of this review is to bring together material from cognitive psychology with recent machine vision studies to identify plausible neural mechanisms for visual same-different discrimination and relational understanding. We highlight how developments in the study of artificial neural networks provide computational evidence implicating attention and working memory in the ascertaining of visual relations, including same- different relations. We review some recent attempts to incorporate these mechanisms into flexible models of visual reasoning. Particular attention is given to recent models jointly trained on visual and linguistic information. These recent systems are promising, but they still fall short of the biological standard in several ways, which we outline in a final section. 
    more » « less
  3. Given the rich visual information available in each glance, humans can internally direct their visual attention to enhance goal-relevant information—a capacity often absent in standard vision models. Here we introduce cognitively and biologically-inspired long-range modulatory pathways to enable 'cognitive steering' in vision models. First, we show that models equipped with these feedback pathways naturally show improved image recognition, adversarial robustness, and increased brain alignment, relative to baseline models. Further, these feedback projections from the final layer of the vision backbone provide a meaningful steering interface, where goals can be specified as vectors in the output space. We show that there are effective ways to steer the model that dramatically improve recognition of categories in composite images of multiple categories, succeeding where baseline feed-forward models without flexible steering fail. And, our multiplicative modulatory motif prevents rampant hallucination of the top-down goal category, dissociating what the model is looking for, from what it is looking at. Thus, these long-range modulatory pathways enable new behavioral capacities for goal-directed visual encoding, offering a flexible communication interface between cognitive and visual systems. 
    more » « less
  4. Abstract Research on visual attention has uncovered significant anomalies, and some traditional methods may have inadvertently probed peripheral vision rather than attention. Vision science needs to rethink visual attention from the ground up. To facilitate this, for a year I banned the word “attention” in my lab. This constraint promoted a more precise discussion of attention-related phenomena, capacity limits, and mechanisms. The insights gained lead me to challenge attributing to “attention” those phenomena that can be better explained by perceptual processes, are predictable by an ideal observer model, or that otherwise may not require an additional mechanism. I enumerate a set of critical phenomena in need of explanation. Finally, I propose a unifying theory in which all perception results from performing a task, and tasks face a limit on complexity. 
    more » « less
  5. Understanding spatial and visual information is essential for a navigation agent who follows natural language instructions. The current Transformer-based VLN agents entangle the orientation and vision information, which limits the gain from the learning of each information source. In this paper, we design a neural agent with explicit Orientation and Vision modules. Those modules learn to ground spatial information and landmark mentions in the instructions to the visual environment more effectively. To strengthen the spatial reasoning and visual perception of the agent, we design specific pre-training tasks to feed and better utilize the corresponding modules in our final navigation model. We evaluate our approach on both Room2room (R2R) and Room4room (R4R) datasets and achieve the state of the art results on both benchmarks. 
    more » « less