As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.
more »
« less
A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans
As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent's action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.
more »
« less
- Award ID(s):
- 1643411
- PAR ID:
- 10074104
- Date Published:
- Journal Name:
- Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems
- ISSN:
- 1548-8403
- Page Range / eLocation ID:
- 957-965
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The efficiency with which a learner processes external feedback has implications for both learning speed and performance. A growing body of literature suggests that the feedback-related negativity (FRN) event-related potential (ERP) and the fronto-central positivity (FCP) ERP reflect the extent to which feedback is used by a learner to improve performance. To determine whether the FRN and FCP predict learning speed, 82 participants aged 7:6 - 11:0 learned the non-word names of 20 novel objects in a two-choice feedback-based declarative learning task. Participants continued the task until reaching the learning criterion of 2 consecutive training blocks with accuracy greater than 90%, or until 10 blocks were completed. Learning speed was determined by the total number of incorrect responses before reaching the learning criterion. Using linear regression models, the FRN amplitude in response to positive feedback was found to be a significant predictor of learning speed when controlling for age. The FCP amplitude in response to negative feedback was significantly negatively associated with learning speed, meaning that large FCP amplitudes in response to negative feedback predicted faster learning. An interaction between FCP and age suggested that for older children in this sample, smaller FCP amplitude in response to positive feedback was associated with increased speed, while for younger children, larger FCP amplitude predicted faster learning. These results suggest that the feedback related ERP components are associated with learning speed, and can reflect developmental changes in feedback-based learning.more » « less
-
This study introduces AutoCLC, an AI-powered system designed to assess and provide feedback on closed-loop communication (CLC) in professional learning environments. CLC, where a sender’s Call-Out statement is acknowledged by the receiver’s Check-Back statement, is a critical safety protocol in high-reliability domains, including emergency medicine resuscitation teams. Existing methods for evaluating CLC lack quantifiable metrics and depend heavily on human observation. AutoCLC addresses these limitations by leveraging natural language processing and large language models to analyze audio recordings from Advanced Cardiovascular Life Support (ACLS) simulation training. The system identifies CLC instances, measures their frequency and rate per minute, and categorizes communications as effective, incomplete, or missed. Technical evaluations demonstrate AutoCLC achieves 78.9% precision for identifying Call-Outs and 74.3% for Check-Backs, with a performance gap of only 5% compared to human annotations. A user study involving 11 cardiac arrest instructors across three training sites supported the need for automated CLC assessment. Instructors found AutoCLC reports valuable for quantifying CLC frequency and quality, as well as for providing actionable, example-based feedback. Participants rated AutoCLC highly, with a System Usability Scale score of 76.4%, reflecting above-average usability. This work represents a significant step toward developing scalable, data-driven feedback systems that enhance individual skills and team performance in high-reliability settings.more » « less
-
Multi-agent large language models promise flexible, modular architectures for delivering personalized educational content. Drawing on a pilot randomized controlled trial with middle school students (n = 23), we introduce a two-agent GPT-4 framework in which a Profiler agent infers learner-specific preferences and a Rewrite agent dynamically adapts science passages via an explicit message-passing protocol. We implement structured system and user prompts as inter-agent communication schemas to enable real-time content adaptation. The results of an ordinal logistic regression analysis hinted that students may be more likely to prefer texts aligned with their profile, demonstrating the feasibility of multi-agent system-driven personalization and highlighting the need for additional work to build upon this pilot study. Beyond empirical validation, we present a modular multi-agent architecture detailing agent roles, communication interfaces, and scalability considerations. We discuss design best practices, ethical safeguards, and pathways for extending this framework to collaborative agent networks—such as feedback-analysis agents—in K-12 settings. These results advance both our theoretical and applied understanding of multi-agent LLM systems for personalized learning.more » « less
-
null (Ed.)Interactive reinforcement learning (IRL) agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that’s either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known IRL technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep RL agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven RL agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the RL agent to better policy shaping.more » « less
An official website of the United States government

