skip to main content


Title: Where word and world meet: Language and vision share an abstract representation of symmetry
Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects, and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface.  more » « less
Award ID(s):
1941006
NSF-PAR ID:
10345443
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of experimental psychology
ISSN:
0096-3445
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or nonsymmetrical) and had to choose between a symmetrical and nonsymmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface. 
    more » « less
  2. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less
  3. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less
  4. A major goal of grounded language learning research is to enable robots to connect language predicates to a robot’s physical interactive perception of the world. Coupling object exploratory behaviors such as grasping, lifting, and looking with multiple sensory modalities (e.g., audio, haptics, and vision) enables a robot to ground non-visual words like “heavy” as well as visual words like “red”. A major limitation of existing approaches to multi-modal language grounding is that a robot has to exhaustively explore training objects with a variety of actions when learning a new such language predicate. This paper proposes a method for guiding a robot’s behavioral exploration policy when learning a novel predicate based on known grounded predicates and the novel predicate’s linguistic relationship to them. We demonstrate our approach on two datasets in which a robot explored large sets of objects and was tasked with learning to recognize whether novel words applied to those objects. 
    more » « less
  5. In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Kinematic models provide a concise representation of these objects that enable deliberate, generalizable manipulation policies. However, existing approaches to learning these models rely upon visual observations of an object’s motion, and are subject to the effects of occlusions and feature sparsity. Natural language descriptions provide a flexible and efficient means by which humans can provide complementary information in a weakly supervised manner suitable for a variety of different interactions (e.g., demonstrations and remote manipulation). In this paper, we present a multimodal learning framework that incorporates both vision and language information acquired in situ to estimate the structure and parameters that de- fine kinematic models of articulated objects. The visual signal takes the form of an RGB-D image stream that opportunistically captures object motion in an unprepared scene. Accompanying natural language descriptions of the motion constitute the linguistic signal. We model linguistic information using a probabilistic graphical model that grounds natural language descriptions to their referent kinematic motion. By exploiting the complementary nature of the vision and language observations, our method infers correct kinematic models for various multiple-part objects on which the previous state-of-the- art, visual-only system fails. We evaluate our multimodal learning framework on a dataset comprised of a variety of household objects, and demonstrate a 23% improvement in model accuracy over the vision-only baseline. 
    more » « less