NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

https://doi.org/10.18653/v1/2023.acl-long.863

Bao, Yuwei; Lattimer, Barrett; Chai, Joyce (July 2023, The 61th Annual Meeting of the Association for Computational Linguistics)

Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.
more » « less
Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

Bao, Yuwei; Yu, Keunwoo Peter; Zhang, Yichi; Storks, Shane; Bar-Yossef, Itamar; De La Iglesia, Alexander; Su, Megan; Zheng, Xiaolin; Chai, Joyce (November 2023, Findings of Empirical Methods in Natural Language Processing)

Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance.
more » « less
Full Text Available
Learning to Mediate Disparities Towards Pragmatic Communication

Bao, Yuwei; Ghosh, Sayan; Chai, Joyce (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL))

Human communication is a collaborative process. Speakers, on top of conveying their own intent, adjust the content and language expressions by taking the listeners into account, including their knowledge background, personalities, and physical capabilities. Towards building AI agents with similar abilities in language communication, we propose Pragmatic Rational Speaker (PRS), a framework extending Rational Speech Act (RSA). The PRS attempts to learn the speaker-listener disparity and adjust the speech accordingly, by adding a light-weighted disparity adjustment layer into working memory on top of speaker’s long-term memory system. By fixing the long-term memory, the PRS only needs to update its working memory to learn and adapt to different types of listeners. To validate our framework, we create a dataset that simulates different types of speaker-listener disparities in the context of referential games. Our empirical results demonstrate that the PRS is able to shift its output towards the language that listeners are able to understand, significantly improve the collaborative task outcome.
more » « less
Full Text Available
DANLI: Deliberative Agent for Following Natural Language Instructions

Zhang, Yichi; Yang, Jianing; Pan, Jiayi; Storks, Shane; Devraj, Nikhil; Ma, Ziqiao; Yu, Keunwoo Peter; Bao, Yuwei; Chai, Joyce (January 2022, EMNLP)

Full Text Available

Search for: All records