Title: Raising the Roof: Situating Verbs in Symbolic and Embodied Language Processing
Abstract Recent investigations on how people derive meaning from language have focused on task‐dependent shifts between two cognitive systems. The symbolic (amodal) system represents meaning as the statistical relationships between words. The embodied (modal) system represents meaning through neurocognitive simulation of perceptual or sensorimotor systems associated with a word's referent. A primary finding of literature in this field is that the embodied system is only dominant when a task necessitates it, but in certain paradigms, this has only been demonstrated using nouns and adjectives. The purpose of this paper is to study whether similar effects hold with verbs. Experiment 1 evaluated a novel task in which participants rated a selection of verbs on their implied vertical movement. Ratings correlated well with distributional semantic models, establishing convergent validity, though some variance was unexplained by language statistics alone. Experiment 2 replicated previous noun‐based location‐cue congruency experimental paradigms with verbs and showed that the ratings obtained in Experiment 1 predicted reaction times more strongly than language statistics. Experiment 3 modified the location‐cue paradigm by adding movement to create an animated, temporally decoupled, movement‐verb judgment task designed to examine the relative influence of symbolic and embodied processing for verbs. Results were generally consistent with linguistic shortcut hypotheses of symbolic‐embodied integrated language processing; location‐cue congruence elicited processing facilitation in some conditions, and perceptual information accounted for reaction times and accuracy better than language statistics alone. These studies demonstrate novel ways in which embodied and linguistic information can be examined while using verbs as stimuli. more »« less
Kery, Caroline; Pillai, Nisha; Matuszek, Cynthia; Ferraro, Francis
(, IEEE International Conference on Robot and Human Interactive Communication)
null
(Ed.)
Learning the meaning of grounded language---language that references a robot’s physical environment and perceptual data---is an important and increasingly widely studied problem in robotics and human-robot interaction. However, with a few exceptions, research in robotics has focused on learning groundings for a single natural language pertaining to rich perceptual data. We present experiments on taking an existing natural language grounding system designed for English and applying it to a novel multilingual corpus of descriptions of objects paired with RGB-D perceptual data. We demonstrate that this specific approach transfers well to different languages, but also present possible design constraints to consider for grounded language learning systems intended for robots that will function in a variety of linguistic settings.
Shi, Huanhuan; He, Angela Xiaoxue; Song, Hyun-Joo; Jin, Kyong-Sun; Arunachalam, Sudha
(, Language Learning and Development)
To learn new words, particularly verbs, child learners have been shown to benefit from the linguistic contexts in which the words appear. However, cross-linguistic differences affect how this process unfolds. One previous study found that children’s abilities to learn a new verb differed across Korean and English as a function of the sentence in which the verb occurred. The authors hypothesized that the properties of word order and argument drop, which vary systematically in these two languages, were driving the differences. In the current study, we pursued this finding to ask if the difference persists later in development, or if children acquiring different languages come to appear more similar as their linguistic knowledge and learning capacities increase. Preschool-aged monolingual English learners (N = 80) and monolingual Korean learners (N = 64) were presented with novel verbs in contexts that varied in word order and argument drop and accompanying visual stimuli. We assessed their learning by measuring accuracy in a forced-choice pointing task, and we measured eye gaze during the learning phase as an indicator of the processes by which they mapped the novel verbs to meaning. Unlike previous studies which identified differences between English and Korean learning 2-year-olds in a similar task, our results revealed similarities between the two language groups with these older preschoolers. We interpret our results as evidence that over the course of early childhood, children become adept at learning from a large variety of contexts, such that differences between learners of different languages are attenuated.
Brodbeck, Christian; Bhattasali, Shohini; Cruz Heredia, Aura AL; Resnik, Philip; Simon, Jonathan Z; Lau, Ellen
(, eLife)
Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition.
Hoshino, Noriko; Beatty-Martínez, Anne L.; Navarro-Torres, Christian A.; Kroll, Judith F.
(, Frontiers in Communication)
The present study examined the role of script in bilingual speech planning by comparing the performance of same and different-script bilinguals. Spanish-English bilinguals (Experiment 1) and Japanese-English bilinguals (Experiment 2) performed a picture-word interference task in which they were asked to name a picture of an object in English, their second language, while ignoring a visual distractor word in Spanish or Japanese, their first language. Results replicated the general pattern seen in previous bilingual picture-word interference studies for the same-script, Spanish-English bilinguals but not for the different-script, Japanese-English bilinguals. Both groups showed translation facilitation, whereas only Spanish-English bilinguals demonstrated semantic interference, phonological facilitation, and phono-translation facilitation. These results suggest that when the script of the language not in use is present in the task, bilinguals appear to exploit the perceptual difference as a language cue to direct lexical access to the intended language earlier in the process of speech planning.
In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy. The agent can be used for showing navigation routes, delivering objects to people, and relocating objects from one location to another. We use dialog clari_cation questions both to understand commands and to generate additional parsing training data. The agent employs opportunistic active learning to select questions about how words relate to objects, improving its understanding of perceptual concepts. We evaluated this agent on Amazon Mechanical Turk. After training on data induced from conversations, the agent reduced the number of dialog questions it asked while receiving higher usability ratings. Additionally, we demonstrated the agent on a robotic platform, where it learned new perceptual concepts on the y while completing a real-world task.
Hollander, John, and Olney, Andrew. Raising the Roof: Situating Verbs in Symbolic and Embodied Language Processing. Cognitive Science 48.4 Web. doi:10.1111/cogs.13442.
@article{osti_10502445,
place = {Country unknown/Code not available},
title = {Raising the Roof: Situating Verbs in Symbolic and Embodied Language Processing},
url = {https://par.nsf.gov/biblio/10502445},
DOI = {10.1111/cogs.13442},
abstractNote = {Abstract Recent investigations on how people derive meaning from language have focused on task‐dependent shifts between two cognitive systems. The symbolic (amodal) system represents meaning as the statistical relationships between words. The embodied (modal) system represents meaning through neurocognitive simulation of perceptual or sensorimotor systems associated with a word's referent. A primary finding of literature in this field is that the embodied system is only dominant when a task necessitates it, but in certain paradigms, this has only been demonstrated using nouns and adjectives. The purpose of this paper is to study whether similar effects hold with verbs. Experiment 1 evaluated a novel task in which participants rated a selection of verbs on their implied vertical movement. Ratings correlated well with distributional semantic models, establishing convergent validity, though some variance was unexplained by language statistics alone. Experiment 2 replicated previous noun‐based location‐cue congruency experimental paradigms with verbs and showed that the ratings obtained in Experiment 1 predicted reaction times more strongly than language statistics. Experiment 3 modified the location‐cue paradigm by adding movement to create an animated, temporally decoupled, movement‐verb judgment task designed to examine the relative influence of symbolic and embodied processing for verbs. Results were generally consistent with linguistic shortcut hypotheses of symbolic‐embodied integrated language processing; location‐cue congruence elicited processing facilitation in some conditions, and perceptual information accounted for reaction times and accuracy better than language statistics alone. These studies demonstrate novel ways in which embodied and linguistic information can be examined while using verbs as stimuli.},
journal = {Cognitive Science},
volume = {48},
number = {4},
publisher = {Wiley-Blackwell},
author = {Hollander, John and Olney, Andrew},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.