Enabling efficient communication in artificial agents brings us closer to machines that can cooperate with each other and with human partners. Hand-engineered approaches have substantial limitations, leading to increased interest in methods for communication to emerge autonomously between artificial agents. Most of the research in the field explores unsituated communication in one-step referential tasks. The tasks are not temporally interactive and lack time pressures typically present in natural communication and language learning. In these settings, agents can successfully learn what to communicate but not when or whether to communicate. Here, we extend the literature by assessing emergence of communication between reinforcement learning agents in a temporally interactive, cooperative task of navigating a gridworld environment. We show that, through multi-step interactions, agents develop just-in-time messaging protocols that enable them to successfully solve the task. With memory—which provides flexibility around message timing—agent pairs converge to a look-ahead communication protocol, finding an optimal solution to the task more quickly than without memory. Lastly, we explore situated communication, enabling the acting agent to choose when and whether to communicate. With the opportunity cost of forgoing an action to communicate, the acting agent learns to solicit information sparingly, in line with the Gricean Maxim of quantity. Our results point towards the importance of studying language emergence through situated communication in multi-step interactions.
more »
« less
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. While recent work, e.g., Reflexion, has demonstrated how such agents can also self-improve by adding a textual memory of ''hints'' learned from prior experience, such improvements have been limited both in size and scope. In contrast, our goal is a language agent that can robustly improve performance over time, including when both the task and environment are varied. Our approach is to have the agent learn a textual representation of how the world works (rather than just isolated hints), expressed as a memory of causal abstractions, to guide future decision-making. In experiments, we find CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 points in ScienceWorld and 1.4 points in ALFWorld benchmarks. CLIN can also transfer its learning to new environments and tasks, enhancing performance by 21 points in ScienceWorld and 11 points in ALFWorld
more »
« less
- Award ID(s):
- 1928474
- PAR ID:
- 10563519
- Publisher / Repository:
- Proceedings of the Conference on Language Modeling (COLM)
- Date Published:
- Subject(s) / Keyword(s):
- LLMs Agents
- Format(s):
- Medium: X
- Location:
- Philadelphia, PA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Interactive learning environments facilitate learning by providing hints to fill the gaps in the understanding of a concept. Studies suggest that hints are not used optimally by learners. Either they are used unnecessarily or not used at all. It has been shown that learning outcomes can be improved by providing hints when needed. An effective hinttaking prediction model can be used by a learning environment to make adaptive decisions on whether to withhold or provide hints. Past work on student behavior modeling has focused extensively on the task of modeling a learner’s state of knowledge over time, referred to as knowledge tracing. The other aspects of a learner’s behavior such as tendency to use hints has garnered limited attention. Past knowledge tracing models either ignore the questions where a hint was taken or label hints taken as an incorrect response. We propose a multi-task memory-augmented deep learning model to jointly predict the hint-taking and the knowledge tracing task. The model incorporates the effect of past responses as well as hints taken on both the tasks. We apply the model on two datasets – ASSISTments 2009-10 skill builder dataset and Junyi Academy Math Practicing Log. The results show that deep learning models efficiently leverage the sequential information present in a learner’s responses. The proposed model significantly out-performs the past work on hint prediction by at least 12% points. Moreover, we demonstrate that jointly modeling the two tasks improves performance consistently across the tasks and the datasets, albeit by a small amount.more » « less
-
null (Ed.)Communication between human and mobile agents is getting increasingly important as such agents are widely deployed in our daily lives. Vision-and-Dialogue Navigation is one of the tasks that evaluate the agent’s ability to interact with humans for assistance and navigate based on natural language responses. In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers. However, despite achieving competitive performance, we find that the agent in the NDH task is not evaluated appropriately by the primary metric – Goal Progress. By analyzing the performance mismatch between Goal Progress and other metrics (e.g., normalized Dynamic Time Warping) from our state-of-the-art model, we show that NDH’s sub-path based task setup (i.e., navigating partial trajectory based on its correspondent subset of the full dialogue) does not provide the agent with enough supervision signal towards the goal region. Therefore, we propose a new task setup called NDH-Full which takes the full dialogue and the whole navigation path as one instance. We present a strong baseline model and show initial results on this new task. We further describe several approaches that we try, in order to improve the model performance (based on curriculum learning, pre-training, and data-augmentation), suggesting potential useful training methods on this new NDH-Full task.more » « less
-
Recent work has shown how to prompt large language models with explanations to obtain strong performance on textual reasoning tasks, i.e., the chain-of-thought paradigm. However, subtly different explanations can yield widely varying downstream task accuracy. Explanations that have not been “tuned” for a task, such as off-the-shelf explanations written by non-experts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion. We first generate sets of candidate explanations for each example in the prompt using a leave-one-out scheme, then find an effective combination of these explanations with a two-stage framework. We first evaluate explanations for each in-context example in isolation according to two proxy metrics, log likelihood and accuracy on new examples. Then, we search over combinations of explanations to find one that yields high performance against a silver-labeled development set. Across four textual reasoning tasks spanning question answering, mathematical reasoning, and natural language inference, results show that our proxy metrics correlate with ground truth accuracy and our overall method can effectively improve prompts over crowdworker annotations and naive search strategies.more » « less
-
Field trips are widely recognized as an essential educational component to connect classrooms with the real world. When students don’t have access to real field trips, virtual ones have been developed by educators and researchers. Pedagogical agents have been applied to serve as a tour guide and educational tool that facilitate students learning in a virtual learning environment. Such agents are computer software generated and controlled entities that replicate or emulate humans. Previous studies have found that adding anthropomorphic traits to pedagogical agents in learning environments has significantly improved students’ learning experience; however, this area has yet been explored in the context of a virtual construction field trip. In this study, a virtual field trip to a complex mechanical room was developed using 360-degree panoramas and a pedagogical agent was employed to lead the tour. This study focuses on one single anthropomorphic trait - deictic gestures, which are pointing gestures used to refer to specific objects – and explores how such trait affects students’ quantitative learning outcomes and feedbacks on four aspects of the agent, including facilitating learning, credibility, human-like, and engaging. It was found that deictic gestures can improve students’ learning performance and attitudes on multiple aspects of the agent.more » « less
An official website of the United States government

