skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Demystifying visual awareness: Peripheral encoding plus limited decision complexity resolve the paradox of rich visual experience and curious perceptual failures
Abstract Human beings subjectively experience a rich visual percept. However, when behavioral experiments probe the details of that percept, observers perform poorly, suggesting that vision is impoverished. What can explain this awareness puzzle? Is the rich percept a mere illusion? How does vision work as well as it does? This paper argues for two important pieces of the solution. First, peripheral vision encodes its inputs using a scheme that preserves a great deal of useful information, while losing the information necessary to perform certain tasks. The tasks rendered difficult by the peripheral encoding include many of those used to probe the details of visual experience. Second, many tasks used to probe attentional and working memory limits are, arguably, inherently difficult, and poor performance on these tasks may indicate limits on decision complexity. Two assumptions are critical to making sense of this hypothesis: (1) All visual perception, conscious or not, results from performing some visual task; and (2) all visual tasks face the same limit on decision complexity. Together, peripheral encoding plus decision complexity can explain a wide variety of phenomena, including vision’s marvelous successes, its quirky failures, and our rich subjective impression of the visual world.  more » « less
Award ID(s):
1826757
PAR ID:
10311058
Author(s) / Creator(s):
Date Published:
Journal Name:
Attention, Perception, & Psychophysics
Volume:
82
Issue:
3
ISSN:
1943-3921
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Research on visual attention has uncovered significant anomalies, and some traditional methods may have inadvertently probed peripheral vision rather than attention. Vision science needs to rethink visual attention from the ground up. To facilitate this, for a year I banned the word “attention” in my lab. This constraint promoted a more precise discussion of attention-related phenomena, capacity limits, and mechanisms. The insights gained lead me to challenge attributing to “attention” those phenomena that can be better explained by perceptual processes, are predictable by an ideal observer model, or that otherwise may not require an additional mechanism. I enumerate a set of critical phenomena in need of explanation. Finally, I propose a unifying theory in which all perception results from performing a task, and tasks face a limit on complexity. 
    more » « less
  2. null (Ed.)
    Along with textual content, visual features play an essential role in the semantics of visually rich documents. Information extraction (IE) tasks perform poorly on these documents if these visual cues are not taken into account. In this paper, we present Artemis - a visually aware, machine-learning-based IE method for heterogeneous visually rich documents. Artemis represents a visual span in a document by jointly encoding its visual and textual context for IE tasks. Our main contribution is two-fold. First, we develop a deep-learning model that identifies the local context boundary of a visual span with minimal human-labeling. Second, we describe a deep neural network that encodes the multimodal context of a visual span into a fixed-length vector by taking its textual and layout-specific features into account. It identifies the visual span(s) containing a named entity by leveraging this learned representation followed by an inference task. We evaluate Artemis on four heterogeneous datasets from different domains over a suite of information extraction tasks. Results show that it outperforms state-of-the-art text-based methods by up to 17 points in F1-score. 
    more » « less
  3. Peripheral vision is fundamental for many real-world tasks, including walking, driving, and aviation. Nonetheless, there has been no effort to connect these applied literatures to research in peripheral vision in basic vision science or sports science. To close this gap, we analyzed 60 relevant papers, chosen according to objective criteria. Applied research, with its real-world time constraints, complex stimuli, and performance measures, reveals new functions of peripheral vision. Peripheral vision is used to monitor the environment (e.g., road edges, traffic signs, or malfunctioning lights), in ways that differ from basic research. Applied research uncovers new actions that one can perform solely with peripheral vision (e.g., steering a car, climbing stairs). An important use of peripheral vision is that it helps compare the position of one’s body/vehicle to objects in the world. In addition, many real-world tasks require multitasking, and the fact that peripheral vision provides degraded but useful information means that tradeoffs are common in deciding whether to use peripheral vision or move one’s eyes. These tradeoffs are strongly influenced by factors like expertise, age, distraction, emotional state, task importance, and what the observer already knows. These tradeoffs make it hard to infer from eye movements alone what information is gathered from peripheral vision and what tasks we can do without it. Finally, we recommend three ways in which basic, sport, and applied science can benefit each other’s methodology, furthering our understanding of peripheral vision more generally. 
    more » « less
  4. The ability to correctly determine the position of objects in space is a fundamental task of the visual system. The perceived position of briefly presented static objects can be influenced by nearby moving contours, as demonstrated by various illusions collectively known as motion-induced position shifts. Here we use a stimulus that produces a particularly strong effect of motion on perceived position. We test whether several regions-of-interest (ROIs), at different stages of visual processing, encode the perceived rather than retinotopically veridical position. Specifically, we collect functional MRI data while participants experience motion-induced position shifts and use a multivariate pattern analysis approach to compare the activation patterns evoked by illusory position shifts with those evoked by matched physical shifts. We find that the illusory perceived position is represented at the earliest stages of the visual processing stream, including primary visual cortex. Surprisingly, we found no evidence of percept-based encoding of position in visual areas beyond area V3. This result suggests that while it is likely that higher-level visual areas are involved in position encoding, early visual cortex also plays an important role. 
    more » « less
  5. Understanding spatial and visual information is essential for a navigation agent who follows natural language instructions. The current Transformer-based VLN agents entangle the orientation and vision information, which limits the gain from the learning of each information source. In this paper, we design a neural agent with explicit Orientation and Vision modules. Those modules learn to ground spatial information and landmark mentions in the instructions to the visual environment more effectively. To strengthen the spatial reasoning and visual perception of the agent, we design specific pre-training tasks to feed and better utilize the corresponding modules in our final navigation model. We evaluate our approach on both Room2room (R2R) and Room4room (R4R) datasets and achieve the state of the art results on both benchmarks. 
    more » « less