skip to main content


Title: Where word and world meet: Language and vision share an abstract representation of symmetry
Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or nonsymmetrical) and had to choose between a symmetrical and nonsymmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface.  more » « less
Award ID(s):
2105228
NSF-PAR ID:
10338436
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Journal of experimental psychology
ISSN:
0096-3445
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects, and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface. 
    more » « less
  2. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less
  3. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less
  4. In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Kinematic models provide a concise representation of these objects that enable deliberate, generalizable manipulation policies. However, existing approaches to learning these models rely upon visual observations of an object’s motion, and are subject to the effects of occlusions and feature sparsity. Natural language descriptions provide a flexible and efficient means by which humans can provide complementary information in a weakly supervised manner suitable for a variety of different interactions (e.g., demonstrations and remote manipulation). In this paper, we present a multimodal learning framework that incorporates both vision and language information acquired in situ to estimate the structure and parameters that de- fine kinematic models of articulated objects. The visual signal takes the form of an RGB-D image stream that opportunistically captures object motion in an unprepared scene. Accompanying natural language descriptions of the motion constitute the linguistic signal. We model linguistic information using a probabilistic graphical model that grounds natural language descriptions to their referent kinematic motion. By exploiting the complementary nature of the vision and language observations, our method infers correct kinematic models for various multiple-part objects on which the previous state-of-the- art, visual-only system fails. We evaluate our multimodal learning framework on a dataset comprised of a variety of household objects, and demonstrate a 23% improvement in model accuracy over the vision-only baseline. 
    more » « less
  5. In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Kinematic models provide a concise representation of these objects that enable deliberate, generalizable manipulation policies. However, existing approaches to learning these models rely upon visual observations of an object's motion, and are subject to the effects of occlusions and feature sparsity. Natural language descriptions provide a flexible and efficient means by which humans can provide complementary information in a weakly supervised manner suitable for a variety of different interactions (e.g., demonstrations and remote manipulation). In this paper, we present a multimodal learning framework that incorporates both vision and language information acquired in situ to estimate the structure and parameters that define kinematic models of articulated objects. The visual signal takes the form of an RGB-D image stream that opportunistically captures object motion in an unprepared scene. Accompanying natural language descriptions of the motion constitute the linguistic signal. We model linguistic information using a probabilistic graphical model that grounds natural language descriptions to their referent kinematic motion. By exploiting the complementary nature of the vision and language observations, our method infers correct kinematic models for various multiple-part objects on which the previous state-of-the-art, visual-only system fails. We evaluate our multimodal learning framework on a dataset comprised of a variety of household objects, and demonstrate a 23% improvement in model accuracy over the vision-only baseline. 
    more » « less