skip to main content


Title: Why We Should Build Robots That Both Teach and Learn
In this paper, we argue in favor of creating robots that both teach and learn. We propose a methodology for building robots that can learn a skill from an expert, perform the skill independently or collaboratively with the expert, and then teach the same skill to a novice. This requires combining insights from learning from demonstration, human-robot collaboration, and intelligent tutoring systems to develop knowledge representations that can be shared across all three components. As a case study for our methodology, we developed a glockenspiel-playing robot. The robot begins as a novice, learns how to play musical harmonies from an expert, collaborates with the expert to complete harmonies, and then teaches the harmonies to novice users. This methodology allows for new evaluation metrics that provide a thorough understanding of how well the robot has learned and enables a robot to act as an efficient facilitator for teaching across temporal and geographic separation.  more » « less
Award ID(s):
1813651
NSF-PAR ID:
10284318
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
HRI '21: Proceedings of the 2021 ACM/IEEE International Conference on Human-Robot Interaction
Page Range / eLocation ID:
187 to 196
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    With growing access to versatile robotics, it is beneficial for end users to be able to teach robots tasks without needing to code a control policy. One possibility is to teach the robot through successful task executions. However, near-optimal demonstrations of a task can be difficult to provide and even successful demonstrations can fail to capture task aspects key to robust skill replication. Here, we propose a learning from demonstration (LfD) approach that enables learning of robust task definitions without the need for near-optimal demonstrations. We present a novel algorithmic framework for learning task specifications based on the ergodic metric—a measure of information content in motion. Moreover, we make use of negative demonstrations— demonstrations of what not to do—and show that they can help compensate for imperfect demonstrations, reduce the number of demonstrations needed, and highlight crucial task elements improving robot performance. In a proof-of-concept example of cart-pole inversion, we show that negative demonstrations alone can be sufficient to successfully learn and recreate a skill. Through a human subject study with 24 participants, we show that consistently more information about a task can be captured from combined positive and negative (posneg) demonstrations than from the same amount of just positive demonstrations. Finally, we demonstrate our learning approach on simulated tasks of target reaching and table cleaning with a 7-DoF Franka arm. Our results point towards a future with robust, data efficient LfD for novice users. 
    more » « less
  2. The introduction of collaborative robots (cobots) into the workplace has presented both opportunities and challenges for those seeking to utilize their functionality. Prior research has shown that despite the capabilities afforded by cobots, there is a disconnect between those capabilities and the applications that they currently are deployed in, partially due to a lack of effective cobot-focused instruction in the field. Experts who work successfully within this collaborative domain could offer insight into the considerations and process they use to more effectively capture this cobot capability. Using an analysis of expert insights in the collaborative interaction design space, we developed a set of Expert Frames based on these insights and integrated these Expert Frames into a new training and programming system that can be used to teach novice operators to think, program, and troubleshoot in ways that experts do. We present our system and case studies that demonstrate how Expert Frames provide novice users with the ability to analyze and learn from complex cobot application scenarios. 
    more » « less
  3. Abstract

    Human–exoskeleton interactions have the potential to bring about changes in human behavior for physical rehabilitation or skill augmentation. Despite significant advances in the design and control of these robots, their application to human training remains limited. The key obstacles to the design of such training paradigms are the prediction of human–exoskeleton interaction effects and the selection of interaction control to affect human behavior. In this article, we present a method to elucidate behavioral changes in the human–exoskeleton system and identify expert behaviors correlated with a task goal. Specifically, we observe the joint coordinations of the robot, also referred to as kinematic coordination behaviors, that emerge from human–exoskeleton interaction during learning. We demonstrate the use of kinematic coordination behaviors with two task domains through a set of three human-subject studies. We find that participants (1) learn novel tasks within the exoskeleton environment, (2) demonstrate similarity of coordination during successful movements within participants, (3) learn to leverage these coordination behaviors to maximize success within participants, and (4) tend to converge to similar coordinations for a given task strategy across participants. At a high level, we identify task-specific joint coordinations that are used by different experts for a given task goal. These coordinations can be quantified by observing experts and the similarity to these coordinations can act as a measure of learning over the course of training for novices. The observed expert coordinations may further be used in the design of adaptive robot interactions aimed at teaching a participant the expert behaviors.

     
    more » « less
  4. ABSTRACT Introduction

    Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator’s demonstration into subtasks and classifying the subtasks using sequence modeling.

    Materials and Methods

    A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform—SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset—one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.

    Results

    The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).

    Conclusions

    Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

     
    more » « less
  5. Robot-mediated therapy is an emerging field of research seeking to improve therapy for children with Autism Spectrum Disorder (ASD). Current approaches to autonomous robot-mediated therapy often focus on having a robot teach a single skill to children with ASD and lack a personalized approach to each individual. More recently, Learning from Demonstration (LfD) approaches are being explored to teach socially assistive robots to deliver personalized interventions after they have been deployed but these approaches require large amounts of demonstrations and utilize learning models that cannot be easily interpreted. In this work, we present a LfD system capable of learning the delivery of autism therapies in a data-efficient manner utilizing learning models that are inherently interpretable. The LfD system learns a behavioral model of the task with minimal supervision via hierarchical clustering and then learns an interpretable policy to determine when to execute the learned behaviors. The system is able to learn from less than an hour of demonstrations and for each of its predictions can identify demonstrated instances that contributed to its decision. The system performs well under unsupervised conditions and achieves even better performance with a low-effort human correction process that is enabled by the interpretable model. 
    more » « less