NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Multimodal Contextualized Semantic Parsing from Speech

Voas, Jordan; Mooney, Raymond; Harwath, David (June 2024, https://doi.org/10.48550/arXiv.2406.06438)

his paper introduces Semantic Parsing in Contextual Environments (SPICE), a task aimed at improving artificial agents’ contextual awareness by integrating multimodal inputs with prior contexts. Unlike traditional semantic parsing, SPICE provides a structured and interpretable framework for dynamically updating an agent’s knowledge with new information, reflecting the complexity of human communication. To support this task, the authors develop the VG-SPICE dataset, which challenges models to construct visual scene graphs from spoken conversational exchanges, emphasizing the integration of speech and visual data. They also present the Audio-Vision Dialogue Scene Parser (AViD-SP), a model specifically designed for VG-SPICE. Both the dataset and model are released publicly, with the goal of advancing multimodal information processing and integration.
more » « less
Full Text Available
Natural Language Can Help Bridge the Sim2Real Gap

Yu, Albert; Foote, Adeline; Mooney, Raymond; Martín-Martín, Roberto (June 2024, Robotics, Science and Systems (RSS))

The main challenge in learning image-conditioned robotic policies is acquiring a visual representation conducive to low-level control. Due to the high dimensionality of the image space, learning a good visual representation requires a considerable amount of visual data. However, when learning in the real world, data is expensive. Sim2Real is a promising paradigm for overcoming data scarcity in the real-world target domain by using a simulator to collect large amounts of cheap data closely related to the target task. However, it is difficult to transfer an image-conditioned policy from sim to real when the domains are very visually dissimilar. To bridge the sim2real visual gap, we propose using natural language descriptions of images as a unifying signal across domains that captures the underlying task-relevant semantics. Our key insight is that if two image observations from different domains are labeled with similar language, the policy should predict similar action distributions for both images. We demonstrate that training the image encoder to predict the language description or the distance between descriptions of a sim or real image serves as a useful, data-efficient pretraining step that helps learn a domain-invariant image representation. We can then use this image encoder as the backbone of an IL policy trained simultaneously on a large amount of simulated and a handful of real demonstrations. Our approach outperforms widely used prior sim2real methods and strong vision-language pretraining baselines like CLIP and R3M by 25 to 40 percent. See additional videos and materials at our project website.
more » « less
Full Text Available
What is the Best Automated Metric for Text to Motion Generation?

https://doi.org/10.1145/3610548.3618185

Voas, Jordan; Wang, Yili; Huang, Qixing; Mooney, Raymond (December 2023, ACM SIGGRAPH Asia)

Full Text Available
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks

Yu, Albert; Mooney, Raymond (May 2023, International Conference on Learning Representations (ICLR))

Demonstrations and natural language instructions are two common ways to specify and teach robots novel tasks. However, for many complex tasks, a demonstration or language instruction alone contains ambiguities, preventing tasks from being specified clearly. In such cases, a combination of both a demonstration and an instruction more concisely and effectively conveys the task to the robot than either modality alone. To instantiate this problem setting, we train a single multi-task policy on a few hundred challenging robotic pick-and-place tasks and propose DeL-TaCo (Joint Demo-Language Task Conditioning), a method for conditioning a robotic policy on task embeddings comprised of two components: a visual demonstration and a language instruction. By allowing these two modalities to mutually disambiguate and clarify each other during novel task specification, DeL-TaCo (1) substantially decreases the teacher effort needed to specify a new task and (2) achieves better generalization performance on novel objects and instructions over previous task-conditioning methods. To our knowledge, this is the first work to show that simultaneously conditioning a multi-task robotic manipulation policy on both demonstration and language embeddings improves sample efficiency and generalization over conditioning on either modality alone. See additional materials at https://sites.google.com/view/del-taco-learning
more » « less
Full Text Available
Learning Deep Semantics for Test Completion

https://doi.org/10.1109/ICSE48619.2023.00178

Nie, Pengyu; Banerjee, Rahul; Li, Junyi Jessy; Mooney, Raymond J.; Gligoric, Milos (May 2023, International Conference on Software Engineering)

Full Text Available
Text-to-SQL Error Correction with Language Models of Code

Chen, Ziru; Chen, Shijie; White, Michael; Mooney, Raymond; Payani, Ali; Srinivasa, Jayanth; Su, Yu; Sun, Huan (May 2023, ACL)

Full Text Available
Using Developer Discussions to Guide Fixing Bugs in Software

Panthaplackel, Sheena; Gligoric, Milos; Li, Junyi Jessy; Mooney, Raymond (December 2022, Findings of the Association for Computational Linguistics: EMNLP 2022)

Automatically fixing software bugs is a challenging task. While recent work showed that natural language context is useful in guiding bug-fixing models, the approach required prompting developers to provide this context, which was simulated through commit messages written after the bug-fixing code changes were made. We instead propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for any additional information from developers. For this, we augment standard bug-fixing datasets with bug report discussions. Using these newly compiled datasets, we demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
more » « less
Full Text Available
Impact of Evaluation Methodologies on Code Summarization

Nie, Pengyu; Zhang, Jiyang; Mooney, Raymond; Li, Junyi; Gligoric, Milos (January 2022, Association for Computational Linguistics)

Full Text Available
Using Commonsense Knowledge to Answer Why-Questions

https://doi.org/10.18653/v1/2022.emnlp-main.79

Lal, Yash Kumar; Tandon, Niket; Aggarwal, Tanvi; Liu, Horace; Chambers, Nathanael; Mooney, Raymond; Balasubramanian, Niranjan (January 2022, Empirical Methods in Natural Language Processing)

Full Text Available
TellMeWhy: A Dataset for Answering Why-Questions in Narratives

Lal, Yash Kumar; Chambers, Nathanael; Mooney, Raymond; Balasubramanian, Niranjan (August 2021, Findings of the Association for Computational Linguistics)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records