skip to main content


Title: A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans
As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.  more » « less
Award ID(s):
1643413
PAR ID:
10026423
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
AAMAS
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent's action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions. 
    more » « less
  2. The efficiency with which a learner processes external feedback has implications for both learning speed and performance. A growing body of literature suggests that the feedback-related negativity (FRN) event-related potential (ERP) and the fronto-central positivity (FCP) ERP reflect the extent to which feedback is used by a learner to improve performance. To determine whether the FRN and FCP predict learning speed, 82 participants aged 7:6 - 11:0 learned the non-word names of 20 novel objects in a two-choice feedback-based declarative learning task. Participants continued the task until reaching the learning criterion of 2 consecutive training blocks with accuracy greater than 90%, or until 10 blocks were completed. Learning speed was determined by the total number of incorrect responses before reaching the learning criterion. Using linear regression models, the FRN amplitude in response to positive feedback was found to be a significant predictor of learning speed when controlling for age. The FCP amplitude in response to negative feedback was significantly negatively associated with learning speed, meaning that large FCP amplitudes in response to negative feedback predicted faster learning. An interaction between FCP and age suggested that for older children in this sample, smaller FCP amplitude in response to positive feedback was associated with increased speed, while for younger children, larger FCP amplitude predicted faster learning. These results suggest that the feedback related ERP components are associated with learning speed, and can reflect developmental changes in feedback-based learning. 
    more » « less
  3. Motivational agents are virtual agents that seek to motivate users by providing feedback and guidance. Prior work has shown how certain factors of an agent, such as the type of feedback given or the agent’s appearance, can influence user motivation when completing tasks. However, it is not known how nonverbal mirroring affects an agent’s ability to motivate users. Specifically, would an agent that mirrors be more motivating than an agent that does not? Would an agent trained on real human behaviors be better? We conducted a within-subjects study asking 30 participants to play a “find-the-hidden-object” game while interacting with a motivational agent that would provide hints and feedback on the user’s performance. We created three agents: a Control agent that did not respond to the user’s movements, a simple Mimic agent that mirrored the user’s movements on a delay, and a Complex agent that used a machine-learned behavior model. We asked participants to complete a questionnaire asking them to rate their levels of motivation and perceptions of the agent and its feedback. Our results showed that the Mimic agent was more motivating than the Control agent and more helpful than the Complex agent. We also found that when participants became aware of the mimicking behavior, it can feel weird or creepy; therefore, it is important to consider the detection of mimicry when designing virtual agents. 
    more » « less
  4. Abstract

    Sensory feedback is critical in fine motor control, learning, and adaptation. However, robotic prosthetic limbs currently lack the feedback segment of the communication loop between user and device. Sensory substitution feedback can close this gap, but sometimes this improvement only persists when users cannot see their prosthesis, suggesting the provided feedback is redundant with vision. Thus, given the choice, users rely on vision over artificial feedback. To effectively augment vision, sensory feedback must provide information that vision cannot provide or provides poorly. Although vision is known to be less precise at estimating speed than position, no work has compared speed precision of biomimetic arm movements. In this study, we investigated the uncertainty of visual speed estimates as defined by different virtual arm movements. We found that uncertainty was greatest for visual estimates of joint speeds, compared to absolute rotational or linear endpoint speeds. Furthermore, this uncertainty increased when the joint reference frame speed varied over time, potentially caused by an overestimation of joint speed. Finally, we demonstrate a joint-based sensory substitution feedback paradigm capable of significantly reducing joint speed uncertainty when paired with vision. Ultimately, this work may lead to improved prosthesis control and capacity for motor learning.

     
    more » « less
  5. null (Ed.)
    Interactive reinforcement learning (IRL) agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that’s either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known IRL technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep RL agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven RL agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the RL agent to better policy shaping. 
    more » « less