skip to main content


Title: Evolutionary Constraints on Human Object Perception
Abstract

Language and culture endow humans with access to conceptual information that far exceeds any which could be accessed by a non‐human animal. Yet, it is possible that, even without language or specific experiences, non‐human animals represent and infer some aspects of similarity relations between objects in the same way as humans. Here, we show that monkeys’ discrimination sensitivity when identifying images of animals is predicted by established measures of semantic similarity derived from human conceptual judgments. We used metrics from computer vision and computational neuroscience to show that monkeys’ and humans’ performance cannot be explained by low‐level visual similarity alone. The results demonstrate that at least some of the underlying structure of object representations in humans is shared with non‐human primates, at an abstract level that extends beyond low‐level visual similarity. Because the monkeys had no experience with the objects we tested, the results suggest that monkeys and humans share a primitive representation of object similarity that is independent of formal knowledge and cultural experience, and likely derived from common evolutionary constraints on object representation.

 
more » « less
PAR ID:
10245731
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Cognitive Science
Volume:
41
Issue:
8
ISSN:
0364-0213
Page Range / eLocation ID:
p. 2126-2148
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Introduction

    How do multiple sources of information interact to form mental representations of object categories? It is commonly held that object categories reflect the integration of perceptual features and semantic/knowledge‐based features. To explore the relative contributions of these two sources of information, we used functional magnetic resonance imaging (fMRI) to identify regions involved in the representation object categories with shared visual and/or semantic features.

    Methods

    Participants (N = 20) viewed a series of objects that varied in their degree of visual and semantic overlap in the MRI scanner. We used a blocked adaptation design to identify sensitivity to visual and semantic features in a priori visual processing regions and in a distributed network of object processing regions with an exploratory whole‐brain analysis.

    Results

    Somewhat surprisingly, within higher‐order visual processing regions—specifically lateral occipital cortex (LOC)—we did not obtain any difference in neural adaptation for shared visual versus semantic category membership. More broadly, both visual and semantic information affected a distributed network of independently identified category‐selective regions. Adaptation was seen a whole‐brain network of processing regions in response to visual similarity and semantic similarity; specifically, the angular gyrus (AnG) adapted to visual similarity and the dorsomedial prefrontal cortex (DMPFC) adapted to both visual and semantic similarity.

    Conclusions

    Our findings suggest that perceptual features help organize mental categories throughout the object processing hierarchy. Most notably, visual similarity also influenced adaptation in nonvisual brain regions (i.e., AnG and DMPFC). We conclude that category‐relevant visual features are maintained in higher‐order conceptual representations and visual information plays an important role in both the acquisition and neural representation of conceptual object categories.

     
    more » « less
  2. Isik, Leyla (Ed.)
    After years of experience, humans become experts at perceiving letters. Is this visual capacity attained by learning specialized letter features, or by reusing general visual features previously learned in service of object categorization? To explore this question, we first measured the perceptual similarity of letters in two behavioral tasks, visual search and letter categorization. Then, we trained deep convolutional neural networks on either 26-way letter categorization or 1000-way object categorization, as a way to operationalize possible specialized letter features and general object-based features, respectively. We found that the general object-based features more robustly correlated with the perceptual similarity of letters. We then operationalized additional forms of experience-dependent letter specialization by altering object-trained networks with varied forms of letter training; however, none of these forms of letter specialization improved the match to human behavior. Thus, our findings reveal that it is not necessary to appeal to specialized letter representations to account for perceptual similarity of letters. Instead, we argue that it is more likely that the perception of letters depends on domain-general visual features. 
    more » « less
  3. Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects, and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface. 
    more » « less
  4. Humans often use natural language instructions to control and interact with robots for task execution. This poses a big challenge to robots that need to not only parse and understand human instructions but also realise semantic understanding of an unknown environment and its constituent elements. To address this challenge, this study presents a vision-language model (VLM)-driven approach to scene understanding of an unknown environment to enable robotic object manipulation. Given language instructions, a pretrained vision-language model built on open-sourced Llama2-chat (7B) as the language model backbone is adopted for image description and scene understanding, which translates visual information into text descriptions of the scene. Next, a zero-shot-based approach to fine-grained visual grounding and object detection is developed to extract and localise objects of interest from the scene task. Upon 3D reconstruction and pose estimate establishment of the object, a code-writing large language model (LLM) is adopted to generate high-level control codes and link language instructions with robot actions for downstream tasks. The performance of the developed approach is experimentally validated through table-top object manipulation by a robot. 
    more » « less
  5. null (Ed.)
    Abstract Perception, representation, and memory of ensemble statistics has attracted growing interest. Studies found that, at different abstraction levels, the brain represents similar items as unified percepts. We found that global ensemble perception is automatic and unconscious, affecting later perceptual judgments regarding individual member items. Implicit effects of set mean and range for low-level feature ensembles (size, orientation, brightness) were replicated for high-level category objects. This similarity suggests that analogous mechanisms underlie these extreme levels of abstraction. Here, we bridge the span between visual features and semantic object categories using the identical implicit perception experimental paradigm for intermediate novel visual-shape categories, constructing ensemble exemplars by introducing systematic variations of a central category base or ancestor. In five experiments, with different item variability, we test automatic representation of ensemble category characteristics and its effect on a subsequent memory task. Results show that observer representation of ensembles includes the group’s central shape, category ancestor (progenitor), or group mean. Observers also easily reject memory of shapes belonging to different categories, i.e. originating from different ancestors. We conclude that complex categories, like simple visual form ensembles, are represented in terms of statistics including a central object, as well as category boundaries. We refer to the model proposed by Benna and Fusi ( bioRxiv 624239, 2019) that memory representation is compressed when related elements are represented by identifying their ancestor and each one’s difference from it. We suggest that ensemble mean perception, like category prototype extraction, might reflect employment at different representation levels of an essential, general representation mechanism. 
    more » « less