Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a stepping stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world.
more »
« less
Visual Illusions in Radiology: Untrue Perceptions in Medical Images and Their Implications for Diagnostic Accuracy
Errors in radiologic interpretation are largely the result of failures of perception. This remains true despite the increasing use of computer-aided detection and diagnosis. We surveyed the literature on visual illusions during the viewing of radiologic images. Misperception of anatomical structures is a potential cause of error that can lead to patient harm if disease is seen when none is present. However, visual illusions can also help enhance the ability of radiologists to detect and characterize abnormalities. Indeed, radiologists have learned to exploit certain perceptual biases in diagnostic findings and as training tools. We propose that further detailed study of radiologic illusions would help clarify the mechanisms underlying radiologic performance and provide additional heuristics to improve radiologist training and reduce medical error.
more »
« less
- Award ID(s):
- 1734887
- PAR ID:
- 10429385
- Date Published:
- Journal Name:
- Frontiers in Neuroscience
- Volume:
- 15
- ISSN:
- 1662-453X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Many species of animals exhibit an intuitive sense of number, suggesting a fundamental neural mechanism for representing numerosity in a visual scene. Recent empirical studies demonstrate that early feedforward visual responses are sensitive to numerosity of a dot array but substantially less so to continuous dimensions orthogonal to numerosity, such as size and spacing of the dots. However, the mechanisms that extract numerosity are unknown. Here, we identified the core neurocomputational principles underlying these effects: (1) center-surround contrast filters; (2) at different spatial scales; with (3) divisive normalization across network units. In an untrained computational model, these principles eliminated sensitivity to size and spacing, making numerosity the main determinant of the neuronal response magnitude. Moreover, a model implementation of these principles explained both well-known and relatively novel illusions of numerosity perception across space and time. This supports the conclusion that the neural structures and feedforward processes that encode numerosity naturally produce visual illusions of numerosity. Taken together, these results identify a set of neurocomputational properties that gives rise to the ubiquity of the number sense in the animal kingdom.more » « less
-
This study examined the application of GPT-4 with vision (GPT-4V), a multimodal large language model with visual recognition, in detecting radiologic findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs.more » « less
-
Summary Recent work suggests that some aspects of lung nodule detection ability may relate to object recognition ability. However, this work only sampled radiological novices. Here, we further investigate whether object recognition ability predicts lung nodule detection ability (as measured by the Vanderbilt Chest Radiograph Test or VCRT), after controlling for experience and fluid intelligence, in a sample of radiologists and nonradiologists. We find that radiological experience accounts for approximately 50% of VCRT variance. After controlling for experience, fluid intelligence and object recognition ability account for an additional 15% of VCRT variance. These results suggest that while training is key in learning to detect nodules, given the same experience level, those with higher fluid intelligence and object recognition ability perform better. The recently proposed construct of visual object recognition ability may add unique information relative to general cognitive skills in assessing aptitude for a career in radiology.more » « less
-
Animals live in visually complex environments. As a result, visual systems have evolved mechanisms that simplify visual processing and allow animals to focus on the information that is most relevant to adaptive decision making. This review explores two key mechanisms that animals use to efficiently process visual information: categorization and specialization. Categorization occurs when an animal's perceptual system sorts continuously varying stimuli into a set of discrete categories. Specialization occurs when particular classes of stimuli are processed using distinct cognitive operations that are not used for other classes of stimuli. We also describe a nonadaptive consequence of simplifying heuristics: visual illusions, where visual perception consistently misleads the viewer about the state of the external world or objects within it. We take an explicitly comparative approach by exploring similarities and differences in visual cognition across human and nonhuman taxa. Considering areas of convergence and divergence across taxa provides insight into the evolution and function of visual systems and associated perceptual strategies.more » « less
An official website of the United States government

