Enabling efficient communication in artificial agents brings us closer to machines that can cooperate with each other and with human partners. Hand-engineered approaches have substantial limitations, leading to increased interest in methods for communication to emerge autonomously between artificial agents. Most of the research in the field explores unsituated communication in one-step referential tasks. The tasks are not temporally interactive and lack time pressures typically present in natural communication and language learning. In these settings, agents can successfully learn what to communicate but not when or whether to communicate. Here, we extend the literature by assessing emergence of communication between reinforcement learning agents in a temporally interactive, cooperative task of navigating a gridworld environment. We show that, through multi-step interactions, agents develop just-in-time messaging protocols that enable them to successfully solve the task. With memory—which provides flexibility around message timing—agent pairs converge to a look-ahead communication protocol, finding an optimal solution to the task more quickly than without memory. Lastly, we explore situated communication, enabling the acting agent to choose when and whether to communicate. With the opportunity cost of forgoing an action to communicate, the acting agent learns to solicit information sparingly, in line with the Gricean Maxim of quantity. Our results point towards the importance of studying language emergence through situated communication in multi-step interactions.
more »
« less
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. While recent work, e.g., Reflexion, has demonstrated how such agents can also self-improve by adding a textual memory of ''hints'' learned from prior experience, such improvements have been limited both in size and scope. In contrast, our goal is a language agent that can robustly improve performance over time, including when both the task and environment are varied. Our approach is to have the agent learn a textual representation of how the world works (rather than just isolated hints), expressed as a memory of causal abstractions, to guide future decision-making. In experiments, we find CLIN is able to continually improve on repeated trials on the same task and environment, outperforming state-of-the-art reflective language agents like Reflexion by 23 points in ScienceWorld and 1.4 points in ALFWorld benchmarks. CLIN can also transfer its learning to new environments and tasks, enhancing performance by 21 points in ScienceWorld and 11 points in ALFWorld
more »
« less
- Award ID(s):
- 1928474
- PAR ID:
- 10563519
- Publisher / Repository:
- Proceedings of the Conference on Language Modeling (COLM)
- Date Published:
- Subject(s) / Keyword(s):
- LLMs Agents
- Format(s):
- Medium: X
- Location:
- Philadelphia, PA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Interactive learning environments facilitate learning by providing hints to fill the gaps in the understanding of a concept. Studies suggest that hints are not used optimally by learners. Either they are used unnecessarily or not used at all. It has been shown that learning outcomes can be improved by providing hints when needed. An effective hinttaking prediction model can be used by a learning environment to make adaptive decisions on whether to withhold or provide hints. Past work on student behavior modeling has focused extensively on the task of modeling a learner’s state of knowledge over time, referred to as knowledge tracing. The other aspects of a learner’s behavior such as tendency to use hints has garnered limited attention. Past knowledge tracing models either ignore the questions where a hint was taken or label hints taken as an incorrect response. We propose a multi-task memory-augmented deep learning model to jointly predict the hint-taking and the knowledge tracing task. The model incorporates the effect of past responses as well as hints taken on both the tasks. We apply the model on two datasets – ASSISTments 2009-10 skill builder dataset and Junyi Academy Math Practicing Log. The results show that deep learning models efficiently leverage the sequential information present in a learner’s responses. The proposed model significantly out-performs the past work on hint prediction by at least 12% points. Moreover, we demonstrate that jointly modeling the two tasks improves performance consistently across the tasks and the datasets, albeit by a small amount.more » « less
-
null (Ed.)Communication between human and mobile agents is getting increasingly important as such agents are widely deployed in our daily lives. Vision-and-Dialogue Navigation is one of the tasks that evaluate the agent’s ability to interact with humans for assistance and navigate based on natural language responses. In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers. However, despite achieving competitive performance, we find that the agent in the NDH task is not evaluated appropriately by the primary metric – Goal Progress. By analyzing the performance mismatch between Goal Progress and other metrics (e.g., normalized Dynamic Time Warping) from our state-of-the-art model, we show that NDH’s sub-path based task setup (i.e., navigating partial trajectory based on its correspondent subset of the full dialogue) does not provide the agent with enough supervision signal towards the goal region. Therefore, we propose a new task setup called NDH-Full which takes the full dialogue and the whole navigation path as one instance. We present a strong baseline model and show initial results on this new task. We further describe several approaches that we try, in order to improve the model performance (based on curriculum learning, pre-training, and data-augmentation), suggesting potential useful training methods on this new NDH-Full task.more » « less
-
The rapid advancement of large language model (LLM) agents has raised new concerns regarding their safety and security, which cannot be addressed by traditional textual-harm-focused LLM guardrails. We propose GuardAgent, the first guardrail agent to protect other agents by checking whether the agent actions satisfy safety guard requests. Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then converts this plan into guardrail code for execution. In both steps, an LLM is utilized as the reasoning component, supplemented by in-context demonstrations retrieved from a memory module storing information from previous tasks. GuardAgent can understand different safety guard requests and provide reliable code-based guardrails with high flexibility and low operational overhead. In addition, we propose two novel benchmarks: EICU-AC benchmark to assess the access control for healthcare agents and Mind2Web-SC benchmark to evaluate the safety regulations for web agents. We show that GuardAgent effectively moderates the violation actions for two types of agents on these two benchmarks with over 98% and 83% guardrail accuracies, respectively.more » « less
-
Language plays a large role in our lives and influences many mental processes. But does every mental process require language? This dissertation investigates how language experience influences the development of thematic roles and pragmatic knowledge, specifically looking at deaf homesigners who have limited to no exposure to spoken or signed language and innovate their own homesign language systems in order to communicate with the people around them. I address methodological questions such as Will these novel tasks work with homesigners? as well as theoretical questions such as Is language required to develop concepts of agents and patients? and Can pragmatic knowledge exist without exposure to typical discourse? I used novel tasks (i.e., referential communication pragmatics tasks and an eye tracking paradigm) in order to investigate homesigners’ pragmatic knowledge and event representation. I found that homesigners will often use pragmatic knowledge and produce necessary relevant information (e.g., modifiers with nouns, or agents and patients with actions). Regarding event representation, homesigners did not appear to use systematic conventionalized strategies (e.g., word order, use of space) to distinguish between agents and patients, although I did observe some preliminary strategies. I also did not find evidence that homesigners used nonlinguistic agent-patient concepts on the eye tracking task. The findings of this dissertation suggest that basic pragmatic knowledge may not require full access to language, but concepts of agent and patient may require more language to fully develop than previously expected. In the absence of early language exposure, lifelong communicative experience may help homesigners to develop pragmatic skills, which then might guide later linguistic structure formation.more » « less
An official website of the United States government

