skip to main content


Title: Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions
A major goal of grounded language learning research is to enable robots to connect language predicates to a robot’s physical interactive perception of the world. Coupling object exploratory behaviors such as grasping, lifting, and looking with multiple sensory modalities (e.g., audio, haptics, and vision) enables a robot to ground non-visual words like “heavy” as well as visual words like “red”. A major limitation of existing approaches to multi-modal language grounding is that a robot has to exhaustively explore training objects with a variety of actions when learning a new such language predicate. This paper proposes a method for guiding a robot’s behavioral exploration policy when learning a novel predicate based on known grounded predicates and the novel predicate’s linguistic relationship to them. We demonstrate our approach on two datasets in which a robot explored large sets of objects and was tasked with learning to recognize whether novel words applied to those objects.  more » « less
Award ID(s):
1637736
NSF-PAR ID:
10060555
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 32nd AAAI Conference on Arti cial Intelligence (AAAI)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or nonsymmetrical) and had to choose between a symmetrical and nonsymmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface. 
    more » « less
  2. Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), symmetry in natural language goes beyond visuospatial properties: many words point to abstract concepts with symmetrical content (e.g., equal, marry). For example, if Mark marries Bill, then Bill marries Mark. In both cases (vision and language), symmetry may be formally characterized as invariance under transformation. Is this a coincidence, or is there some deeper psychological resemblance? Here we asked whether representations of symmetry correspond across language and vision. To do so, we developed a novel cross-modal matching paradigm. On each trial, participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., “negotiate” vs. “propose”). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event’s symmetry. A second study showed that this “language-vision correspondence” generalized to objects, and was weakened when the stimuli’s binary nature was made less apparent (i.e., for one object, rather than two inward-facing objects). A final study showed the same effect when nonsigners guessed English translations of signs from American Sign Language, which expresses many symmetrical concepts spatially. Taken together, our findings support the existence of an abstract representation of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface. 
    more » « less
  3. null (Ed.)
    Learning the meaning of grounded language---language that references a robot’s physical environment and perceptual data---is an important and increasingly widely studied problem in robotics and human-robot interaction. However, with a few exceptions, research in robotics has focused on learning groundings for a single natural language pertaining to rich perceptual data. We present experiments on taking an existing natural language grounding system designed for English and applying it to a novel multilingual corpus of descriptions of objects paired with RGB-D perceptual data. We demonstrate that this specific approach transfers well to different languages, but also present possible design constraints to consider for grounded language learning systems intended for robots that will function in a variety of linguistic settings. 
    more » « less
  4. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less
  5. Fitch, Tecumseh ; Lamm, Claus ; Leder, Helmut ; Tessmar-Raible, Kristin (Ed.)
    Symmetry is ubiquitous in nature, in logic and mathematics, and in perception, language, and thought. Although humans are exquisitely sensitive to visual symmetry (e.g., of a butterfly), linguistic symmetry goes far beyond visuospatial properties: Many words refer to abstract, logically symmetrical concepts (e.g., equal, marry). This raises a question: Do representations of symmetry correspond across language and vision, and if so, how? To address this question, we used a cross-modal matching paradigm. On each trial, adult participants observed a visual stimulus (either symmetrical or non-symmetrical) and had to choose between a symmetrical and non-symmetrical English predicate unrelated to the stimulus (e.g., "negotiate" vs. "propose"). In a first study with visual events (symmetrical collision or asymmetrical launch), participants reliably chose the predicate matching the event's symmetry. A second study showed that this "matching" generalized to static objects, and was weakened when the stimuli's binary-relational nature was made less apparent (i.e., one object with a symmetrical contour, rather than two symmetrically configured objects). Taken together, our findings support the existence of an abstract relational concept of symmetry which humans access via both perceptual and linguistic means. More broadly, this work sheds light on the rich, structured nature of the language-cognition interface, and points towards a possible avenue for acquisition of word-to-world mappings for the seemingly inaccessible logical symmetry of linguistic terms. 
    more » « less