skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Visual Attention in Crisis
Abstract Research on visual attention has uncovered significant anomalies, and some traditional methods may have inadvertently probed peripheral vision rather than attention. Vision science needs to rethink visual attention from the ground up. To facilitate this, for a year I banned the word “attention” in my lab. This constraint promoted a more precise discussion of attention-related phenomena, capacity limits, and mechanisms. The insights gained lead me to challenge attributing to “attention” those phenomena that can be better explained by perceptual processes, are predictable by an ideal observer model, or that otherwise may not require an additional mechanism. I enumerate a set of critical phenomena in need of explanation. Finally, I propose a unifying theory in which all perception results from performing a task, and tasks face a limit on complexity.  more » « less
Award ID(s):
1826757
PAR ID:
10558874
Author(s) / Creator(s):
Publisher / Repository:
Cambridge University Press
Date Published:
Journal Name:
Behavioral and Brain Sciences
ISSN:
0140-525X
Page Range / eLocation ID:
1 to 32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Human beings subjectively experience a rich visual percept. However, when behavioral experiments probe the details of that percept, observers perform poorly, suggesting that vision is impoverished. What can explain this awareness puzzle? Is the rich percept a mere illusion? How does vision work as well as it does? This paper argues for two important pieces of the solution. First, peripheral vision encodes its inputs using a scheme that preserves a great deal of useful information, while losing the information necessary to perform certain tasks. The tasks rendered difficult by the peripheral encoding include many of those used to probe the details of visual experience. Second, many tasks used to probe attentional and working memory limits are, arguably, inherently difficult, and poor performance on these tasks may indicate limits on decision complexity. Two assumptions are critical to making sense of this hypothesis: (1) All visual perception, conscious or not, results from performing some visual task; and (2) all visual tasks face the same limit on decision complexity. Together, peripheral encoding plus decision complexity can explain a wide variety of phenomena, including vision’s marvelous successes, its quirky failures, and our rich subjective impression of the visual world. 
    more » « less
  2. Abstract Humans actively observe the visual surroundings by focusing on salient objects and ignoring trivial details. However, computer vision models based on convolutional neural networks (CNN) often analyze visual input all at once through a single feedforward pass. In this study, we designed a dual-stream vision model inspired by the human brain. This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation. Trained on image recognition, this model examines an image through a sequence of fixations, each time focusing on different parts, thereby progressively building a representation of the image. We evaluated this model against various benchmarks in terms of object recognition, gaze behavior, and adversarial robustness. Our findings suggest that the model can attend and gaze in ways similar to humans without being explicitly trained to mimic human attention and that the model can enhance robustness against adversarial attacks due to its retinal sampling and recurrent processing. In particular, the model can correct its perceptual errors by taking more glances, setting itself apart from all feedforward-only models. In conclusion, the interactions of retinal sampling, eye movement, and recurrent dynamics are important to human-like visual exploration and inference. 
    more » « less
  3. For human children as well as machine learning systems, a key challenge in learning a word is linking the word to the visual phenomena it describes. We explore this aspect of word learn- ing by using the performance of computer vision systems as a proxy for the difficulty of learning a word from visual cues. We show that the age at which children acquire different categories of words is correlated with the performance of visual classifi- cation and captioning systems, over and above the expected effects of word frequency. The performance of the computer vision systems is correlated with human judgments of the con- creteness of words, which are in turn a predictor of children’s word learning, suggesting that these models are capturing the relationship between words and visual phenomena. 
    more » « less
  4. Abstract Studies of voluntary visual spatial attention have used attention-directing cues, such as arrows, to induce or instruct observers to focus selective attention on relevant locations in visual space to detect or discriminate subsequent target stimuli. In everyday vision, however, voluntary attention is influenced by a host of factors, most of which are quite different from the laboratory paradigms that use attention-directing cues. These factors include priming, experience, reward, meaning, motivations, and high-level behavioral goals. Attention that is endogenously directed in the absence of external attention-directing cues has been referred to as “self-initiated attention” or, as in our prior work, as “willed attention” where volunteers decide where to attend in response to a prompt to do so. Here, we used a novel paradigm that eliminated external influences (i.e., attention-directing cues and prompts) about where and/or when spatial attention should be directed. Using machine learning decoding methods, we showed that the well known lateralization of EEG alpha power during spatial attention was also present during purely self-generated attention. By eliminating explicit cues or prompts that affect the allocation of voluntary attention, this work advances our understanding of the neural correlates of attentional control and provides steps toward the development of EEG-based brain–computer interfaces that tap into human intentions. 
    more » « less
  5. Across the lifespan, humans are biased to look first at what is easy to see, with a handful of well-documented visual saliences shaping our attention (e.g., Itti & Koch, 2001). These attentional biases may emerge from the contexts in which moment-tomoment attention occurs, where perceivers and their social partners actively shape bottom-up saliences, moving their bodies and objects to make targets of interest more salient. The goal of the present study was to determine the bottom-up saliences present in infant egocentric images and to provide evidence on the role that infants and their mature social partners play in highlighting targets of interest via these saliences. We examined 968 unique scenes in which an object had purposefully been placed in the infant’s egocentric view, drawn from videos created by one-year-old infants wearing a head camera during toy-play with a parent. To understand which saliences mattered in these scenes, we conducted a visual search task, asking participants (n = 156) to find objects in the egocentric images. To connect this to the behaviors of perceivers, we then characterized the saliences of objects placed by infants or parents compared to objects that were otherwise present in the scenes. Our results show that body-centric properties, such as increases in the centering and visual size of the object, as well as decreases in the number of competing objects immediately surrounding it, both predicted faster search time and distinguished placed and unplaced objects. The present results suggest that the bottom-up saliences that can be readily controlled by perceivers and their social partners may most strongly impact our attention. This finding has implications for the functional role of saliences in human vision, their origin, the social structure of perceptual environments, and how the relation between bottom-up and top-down control of attention in these environments may support infant learning. 
    more » « less