skip to main content


Title: A Need for Speed: Adapting Agent Action Speed to Improve Task Learning from Non-Expert Humans
As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer's target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent's action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions.  more » « less
Award ID(s):
1643411
NSF-PAR ID:
10074104
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems
ISSN:
1548-8403
Page Range / eLocation ID:
957-965
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As robots become pervasive in human environments, it is important to enable users to effectively convey new skills without programming. Most existing work on Interactive Reinforcement Learning focuses on interpreting and incorporating non-expert human feedback to speed up learning; we aim to design a better representation of the learning agent that is able to elicit more natural and effective communication between the human trainer and the learner, while treating human feedback as discrete communication that depends probabilistically on the trainer’s target policy. This work entails a user study where participants train a virtual agent to accomplish tasks by giving reward and/or punishment in a variety of simulated environments. We present results from 60 participants to show how a learner can ground natural language commands and adapt its action execution speed to learn more efficiently from human trainers. The agent’s action execution speed can be successfully modulated to encourage more explicit feedback from a human trainer in areas of the state space where there is high uncertainty. Our results show that our novel adaptive speed agent dominates different fixed speed agents on several measures of performance. Additionally, we investigate the impact of instructions on user performance and user preference in training conditions. 
    more » « less
  2. Abstract Background

    Providing adaptive scaffolds to help learners develop effective self‐regulated learning (SRL) behaviours has been an important goal for intelligent learning environments. Adaptive scaffolding is especially important in open‐ended learning environments (OELE), where novice learners often face difficulties in completing their learning tasks.

    Objectives

    This paper presents a systematic framework for adaptive scaffolding in Betty's Brain, a learning‐by‐teaching OELE for middle school science, where students construct a causal model to teach a virtual agent, generically named Betty. We evaluate the adaptive scaffolding framework and discuss its implications on the development of more effective scaffolds for SRL in OELEs.

    Methods

    We detect key cognitive/metacognitiveinflection points, that is, moments where students' behaviours and performance change during learning, often suggesting an inability to apply effective learning strategies. At inflection points, Mr. Davis (a mentor agent in Betty's Brain) or Betty (the teachable agent) provides context‐specific conversational feedback, focusing on strategies to help the student become a more productive learner, or encouragement to support positive emotions. We conduct a classroom study with 98 middle schoolers to analyse the impact of adaptive scaffolds on students' learning behaviours and performance. We analyse how students with differential pre‐to‐post learning outcomes receive and use the scaffolds to support their subsequent learning process in Betty's Brain.

    Results and Conclusions

    Adaptive scaffolding produced mixed results, with some scaffolds (viz., strategic hints that supported debugging and assessment of causal models) being generally more useful to students than others (viz., encouragement prompts). Additionally, there were differences in how students with high versus low learning outcomes responded to some hints, as suggested by the differences in their learning behaviours and performance in the intervals after scaffolding. Overall, our findings suggest how adaptive scaffolding in OELEs like Betty's Brain can be further improved to better support SRL behaviours and narrow the learning outcomes gap between high and low performing students.

    Implications

    This paper contributes to our understanding and impact of adaptive scaffolding in OELEs. The results of our study indicate that successful scaffolding has to combine context‐sensitive inflection points with conversational feedback that is tailored to the students' current proficiency levels and needs. Also, our conceptual framework can be used to design adaptive scaffolds that help students develop and apply SRL behaviours in other computer‐based learning environments.

     
    more » « less
  3. The efficiency with which a learner processes external feedback has implications for both learning speed and performance. A growing body of literature suggests that the feedback-related negativity (FRN) event-related potential (ERP) and the fronto-central positivity (FCP) ERP reflect the extent to which feedback is used by a learner to improve performance. To determine whether the FRN and FCP predict learning speed, 82 participants aged 7:6 - 11:0 learned the non-word names of 20 novel objects in a two-choice feedback-based declarative learning task. Participants continued the task until reaching the learning criterion of 2 consecutive training blocks with accuracy greater than 90%, or until 10 blocks were completed. Learning speed was determined by the total number of incorrect responses before reaching the learning criterion. Using linear regression models, the FRN amplitude in response to positive feedback was found to be a significant predictor of learning speed when controlling for age. The FCP amplitude in response to negative feedback was significantly negatively associated with learning speed, meaning that large FCP amplitudes in response to negative feedback predicted faster learning. An interaction between FCP and age suggested that for older children in this sample, smaller FCP amplitude in response to positive feedback was associated with increased speed, while for younger children, larger FCP amplitude predicted faster learning. These results suggest that the feedback related ERP components are associated with learning speed, and can reflect developmental changes in feedback-based learning. 
    more » « less
  4. Touch as a modality in social communication has been getting more attention with recent developments in wearable technology and an increase in awareness of how limited physical contact can lead to touch starvation and feelings of depression. Although several mediated touch methods have been developed for conveying emotional support, the transfer of emotion through mediated touch has not been widely studied. This work addresses this need by exploring emotional communication through a novel wearable haptic system. The system records physical touch patterns through an array of force sensors, processes the recordings using novel gesture-based algorithms to create actuator control signals, and generates mediated social touch through an array of voice coil actuators. We conducted a human subject study ( N = 20) to understand the perception and emotional components of this mediated social touch for common social touch gestures, including poking, patting, massaging, squeezing, and stroking. Our results show that the speed of the virtual gesture significantly alters the participants' ratings of valence, arousal, realism, and comfort of these gestures with increased speed producing negative emotions and decreased realism. The findings from the study will allow us to better recognize generic patterns from human mediated touch perception and determine how mediated social touch can be used to convey emotion. Our system design, signal processing methods, and results can provide guidance in future mediated social touch design. 
    more » « less
  5. In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of the action chosen by a central entity, but it is difficult to collect feedback from all users. Instead, only biased feedback (due to user heterogeneity) from a subset of users may be available. In addition to such partial biased feedback, we are also faced with two practical challenges due to communication cost and computation complexity. To tackle these challenges, we carefully design a new distributed phase-then-batch-based elimination (DPBE) algorithm, which samples users in phases for collecting feedback to reduce the bias and employs maximum variance reduction to select actions in batches within each phase. By properly choosing the phase length, the batch size, and the confidence width used for eliminating suboptimal actions, we show that DPBE achieves a sublinear regret of ~O(T1-α/2 +√γT T), where α ∈ (0,1) is the user-sampling parameter one can tune. Moreover, DPBE can significantly reduce both communication cost and computation complexity in distributed kernelized bandits, compared to some variants of the state-of-the-art algorithms (originally developed for standard kernelized bandits). Furthermore, by incorporating various differential privacy models (including the central, local, and shuffle models), we generalize DPBE to provide privacy guarantees for users participating in the distributed learning process. Finally, we conduct extensive simulations to validate our theoretical results and evaluate the empirical performance. 
    more » « less