Memory often fills in what is not there. A striking example of this is boundary extension, whereby observers mistakenly recall a view that extends beyond what was seen. However, not all visual memories extend in this way, which suggests that this process depends on specific scene properties. What factors determine when visual memories will include details that go beyond perceptual experience? Here, seven experiments (N = 1,100 adults) explored whether spatial scale—specifically, perceived viewing distance—drives boundary extension. We created fake miniatures by exploiting tilt shift, a photographic effect that selectively reduces perceived distance while preserving other scene properties (e.g., making a distant railway appear like a model train). Fake miniaturization increased boundary extension for otherwise identical scenes: Participants who performed a scene-memory task misremembered fake- miniaturized views as farther away than they actually were. This effect went beyond low-level image changes and generalized to a completely different distance manipulation. Thus, visual memory is modulated by the spatial scale at which the environment is viewed.
more »
« less
Spatial Scene Memories Are Biased Towards a Fixed Amount of Semantic Information
Abstract Scene memory has known spatial biases. Boundary extension is a well-known bias whereby observers remember visual information beyond an image’s boundaries. While recent studies demonstrate that boundary contraction also reliably occurs based on intrinsic image properties, the specific properties that drive the effect are unknown. This study assesses the extent to which scene memory might have a fixed capacity for information. We assessed both visual and semantic information in a scene database using techniques from image processing and natural language processing, respectively. We then assessed how both types of information predicted memory errors for scene boundaries using a standard rapid serial visual presentation (RSVP) forced error paradigm. A linear regression model indicated that memories for scene boundaries were significantly predicted by semantic, but not visual, information and that this effect persisted when scene depth was considered. Boundary extension was observed for images with low semantic information, and contraction was observed for images with high semantic information. This suggests a cognitive process that normalizes the amount of semantic information held in memory.
more »
« less
- Award ID(s):
- 1920896
- PAR ID:
- 10476470
- Publisher / Repository:
- MIT Press
- Date Published:
- Journal Name:
- Open Mind
- Volume:
- 7
- ISSN:
- 2470-2986
- Page Range / eLocation ID:
- 445 to 459
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Visual scenes are often remembered as if they were observed from a different viewpoint. Some scenes are remembered as farther than they appeared, and others as closer. These memory distortions—also known as boundary extension and contraction—are strikingly consistent for a given scene, but their cause remains unknown. We tested whether these distortions can be explained by an inferential process that adjusts scene memories toward high-probability views, using viewing depth as a test case. We first carried out a large-scale analysis of depth maps of natural indoor scenes to quantify the statistical probability of views in depth. We then assessed human observers’ memory for these scenes at various depths and found that viewpoint judgments were consistently biased toward the modal depth, even when just a few seconds elapsed between viewing and reporting. Thus, scenes closer than the modal depth showed a boundary-extension bias (remembered as farther-away), and scenes farther than the modal depth showed a boundary-contraction bias (remembered as closer). By contrast, scenes at the modal depth did not elicit a consistent bias in either direction. This same pattern of results was observed in a follow-up experiment using tightly controlled stimuli from virtual environments. Together, these findings show that scene memories are biased toward statistically probable views, which may serve to increase the accuracy of noisy or incomplete scene representations.more » « less
-
While viewing a visual stimulus, we often cannot tell whether it is inherently memorable or forgettable. However, the memorability of a stimulus can be quantified and partially predicted by a collection of conceptual and perceptual factors. Higher-level properties that represent the “meaningfulness” of a visual stimulus to viewers best predict whether it will be remembered or forgotten across a population. Here, we hypothesize that the feelings evoked by an image, operationalized as the valence and arousal dimensions of affect, significantly contribute to the memorability of scene images. We ran two complementary experiments to investigate the influence of affect on scene memorability, in the process creating a new image set (VAMOS) of hundreds of natural scene images for which we obtained valence, arousal, and memorability scores. From our first experiment, we found memorability to be highly reliable for scene images that span a wide range of evoked arousal and valence. From our second experiment, we found that both valence and arousal are significant but weak predictors of image memorability. Scene images were most memorable if they were slightly negatively valenced and highly arousing. Images that were extremely positive or unarousing were most forgettable. Valence and arousal together accounted for less than 8% of the variance in image memorability. These findings suggest that evoked affect contributes to the overall memorability of a scene image but, like other singular predictors, does not fully explain it. Instead, memorability is best explained by an assemblage of visual features that combine, in perhaps unintuitive ways, to predict what is likely to stick in our memory.more » « less
-
We propose a boundary-aware multi-task deep-learning- based framework for fast 3D building modeling from a sin- gle overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead im- age by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature net- work architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for ro- bustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.more » « less
-
his paper introduces Semantic Parsing in Contextual Environments (SPICE), a task aimed at improving artificial agents’ contextual awareness by integrating multimodal inputs with prior contexts. Unlike traditional semantic parsing, SPICE provides a structured and interpretable framework for dynamically updating an agent’s knowledge with new information, reflecting the complexity of human communication. To support this task, the authors develop the VG-SPICE dataset, which challenges models to construct visual scene graphs from spoken conversational exchanges, emphasizing the integration of speech and visual data. They also present the Audio-Vision Dialogue Scene Parser (AViD-SP), a model specifically designed for VG-SPICE. Both the dataset and model are released publicly, with the goal of advancing multimodal information processing and integration.more » « less
An official website of the United States government

