A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

Kebe, Gaoussou Y.; Higgins, Padraig; Jenkins, Patrick; Darvish, Kasra; Sachdeva, Rishabh; Barron, Ryan; Winder, John; Engel, Don; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia

Citation Details

Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results. more »

Award ID(s):: 2024878 1813223 1637937 1920079 1940931

PAR ID:: 10382787

Author(s) / Creator(s):: Kebe, Gaoussou Y.; Higgins, Padraig; Jenkins, Patrick; Darvish, Kasra; Sachdeva, Rishabh; Barron, Ryan; Winder, John; Engel, Don; Raff, Edward; Ferraro, Francis; Matuszek, Cynthia

Date Published:: 2021-12-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this