skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Creating ad hoc graphical representations of number
The ability to communicate about exact number is critical to many modern human practices spanning science, industry, and politics. Although some early numeral systems used 1-to-1 correspondence (e.g., ‘IIII' to represent 4), most systems provide compact representations via more arbitrary conventions (e.g., ‘7’ and ‘VII'). When people are unable to rely on conventional numerals, however, what strategies do they initially use to communicate number? Across three experiments, participants used pictures to communicate about visual arrays of objects containing 1–16 items, either by producing freehand drawings or combining sets of visual tokens. We analyzed how the pictures they produced varied as a function of communicative need (Experiment 1), spatial regularities in the arrays (Experiment 2), and visual properties of tokens (Experiment 3). In Experiment 1, we found that participants often expressed number in the form of 1-to-1 representations, but sometimes also exploited the configuration of sets. In Experiment 2, this strategy of using configural cues was exaggerated when sets were especially large, and when the cues were predictably correlated with number. Finally, in Experiment 3, participants readily adopted salient numerical features of objects (e.g., four-leaf clover) and generally combined them in a cumulative-additive manner. Taken together, these findings corroborate historical evidence that humans exploit correlates of number in the external environment – such as shape, configural cues, or 1-to-1 correspondence – as the basis for innovating more abstract number representations.  more » « less
Award ID(s):
2000827
PAR ID:
10498916
Author(s) / Creator(s):
; ;
Publisher / Repository:
Cognition
Date Published:
Journal Name:
Cognition
Volume:
242
Issue:
C
ISSN:
0010-0277
Page Range / eLocation ID:
105665
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. AbstractThis paper reports a formative evaluation of auditory representations of cyber security threat indicators and cues, referred to as sonifications, to warn users about cyber threats. Most Internet browsers provide visual cues and textual warnings to help users identify when they are at risk. Although these alarming mechanisms are very effective in informing users, there are certain situations and circumstances where these alarming techniques are unsuccessful in drawing the user’s attention: (1) security warnings and features (e.g., blocking out malicious Websites) might overwhelm a typical Internet user and thus the users may overlook or ignore visual and textual warnings and, as a result, they might be targeted, (2) these visual cues are inaccessible to certain users such as those with visual impairments. This work is motivated by our previous work of the use of sonification of security warnings to users who are visually impaired. To investigate the usefulness of sonification in general security settings, this work uses real Websites instead of simulated Web applications with sighted participants. The study targets sonification for three different types of security threats: (1) phishing, (2) malware downloading, and (3) form filling. The results show that on average 58% of the participants were able to correctly remember what the sonification conveyed. Additionally, about 73% of the participants were able to correctly identify the threat that the sonification represented while performing tasks using real Websites. Furthermore, the paper introduces “CyberWarner”, a sonification sandbox that can be installed on the Google Chrome browser to enable auditory representations of certain security threats and cues that are designed based on several URL heuristics. Article highlightsIt is feasible to develop sonified cyber security threat indicators that users intuitively understand with minimal experience and training.Users are more cautious about malicious activities in general. However, when navigating real Websites, they are less informed. This might be due to the appearance of the navigating Websites or the overwhelming issues when performing tasks.Participants’ qualitative responses indicate that even when they did not remember what the sonification conveyed, the sonification was able to capture the user’s attention and take safe actions in response. 
    more » « less
  2. Humans are able to recognize objects based on both local texture cues and the configuration of object parts, yet contemporary vision models primarily harvest local texture cues, yielding brittle, non-compositional features. Work on shape-vs- texture bias has pitted shape and texture representations in opposition, measuring shape relative to texture, ignoring the possibility that models (and humans) can simultaneously rely on both types of cues, and obscuring the absolute quality of both types of representation. We therefore recast shape evaluation as a matter of absolute configural competence, operationalized by the Configural Shape Score (CSS), which (i) measures the ability to recognize both images in Object-Anagram pairs that preserve local texture while permuting global part arrangement to depict different object categories. Across 86 convolutional, transformer, and hybrid models, CSS (ii) uncovers a broad spectrum of configural sensitivity with fully self- supervised and language-aligned transformers – exemplified by DINOv2, SigLIP2 and EVA-CLIP – occupying the top end of the CSS spectrum. Mechanistic probes reveal that (iii) high-CSS networks depend on long-range interactions: radius- controlled attention masks abolish performance showing a distinctive U-shaped integration profile, and representational-similarity analyses expose a mid-depth transition from local to global coding. A BagNet control, whose receptive fields straddle patch seams, remains at chance (iv), ruling out any “border-hacking” strategies. Finally, (v) we show that configural shape score also predicts other shape- dependent evals (e.g.,foreground bias, spectral and noise robustness). Overall, we propose that the path toward truly robust, generalizable, and human-like vision systems may not lie in forcing an artificial choice between shape and texture, but rather in architectural and learning frameworks that seamlessly integrate both local-texture and global configural shape 
    more » « less
  3. Self-supervised Vision Transformers (ViTs) like DINOv2 show strong holistic shape processing capabilities, a feature linked to computations in their intermediate layers. However, the specific mechanism by which these layers transform local patch information into a global, configural percept remains a black box. To dis- sect this process, we conduct fine-grained mechanistic analyses by disentangling patch representations into their constituent content and positional information. We find that high-performing models demonstrate a distinct multi-stage processing signature: they first preserve the spatial localization of image content through many layers while concurrently refining their positional representations. Compu- tationally, we show that this is supported by a systematic "local-global handoff," where attention heads gradually shift to aggregating information using long-range interactions. In contrast, models with poor configural ability lose content-specific spatial information early and lack this critical positional refinement stage. This positional refinement is further stabilized by register tokens, which mitigate a common artifact in ViTs; repurpose low-information patch tokens into high-norm ’outliers’ to store global information, causing them to lose their local positional grounding. By isolating these high-norm activations in register tokens, the model better preserves the visual grounding of each patch, which we show also leads to a direct improvement in holistic processing. Overall, our findings suggest that holis- tic vision in ViTs arises not just from long-range attention, but from a structured pipeline that carefully manages the interpl 
    more » « less
  4. Attention and emotion are fundamental psychological systems. It is well established that emotion intensifies attention. Three experiments reported here ( N = 235) demonstrated the reversed causal direction: Voluntary visual attention intensifies perceived emotion. In Experiment 1, participants repeatedly directed attention toward a target object during sequential search. Participants subsequently perceived their emotional reactions to target objects as more intense than their reactions to control objects. Experiments 2 and 3 used a spatial-cuing procedure to manipulate voluntary visual attention. Spatially cued attention increased perceived emotional intensity. Participants perceived spatially cued objects as more emotionally intense than noncued objects even when participants were asked to mentally rehearse the name of noncued objects. This suggests that the intensifying effect of attention is independent of more extensive mental rehearsal. Across experiments, attended objects were perceived as more visually distinctive, which statistically mediated the effects of attention on emotional intensity. 
    more » « less
  5. Road tunnels are enclosed spaces that most occupants only experience while driving through them. In case of fire, however, occupants potentially need to evacuate on foot from a dangerous and unfamiliar environment. Clear and accurate guidance is important for an efficient and safe evacuation from tunnels. Common cues for evacuation guidance are a signage and audio messages that attract occupants to move on appropriate egress routes and avoid unsafe routes. This paper investigates how different types of visual and auditory signals influence occupants’ exit choices in a simulated tunnel evacuation. Common guidance cues were presented to participants in a mobile Head Mounted Display, and they were asked to choose between two possible exit doors in a simulated road tunnel. Two attracting cues (‘‘EXIT’’ signs, audio instructions), and two detracting cues (‘‘DO NOT ENTER’’ signs; traffic cones placed in front of an exit) were studied in three virtual reality (VR) experiments. In each experiment, the presence and direction of the cues were manipulated, and data from 20 participants were collected. Experiment 1 explored the effects of attracting cues, Experiment 2 detracting cues, and Experiment 3 the combination of attracting and detracting cues. Across all studies, participants tended to follow the guidance provided when there was only one cue. When several competing and even contradictory cues were present, participants were most likely to rely on audio instructions, followed by traffic cones and ‘‘DO NOT ENTER’’ signs, whereas ‘‘EXIT’’ signs were often disregarded. We conclude that participants tend to follow temporary cues that could carry current information, as opposed to permanently installed signage. Some corresponding suggestions are put forward on evacuation system design and strategic planning in a tunnel fire. 
    more » « less