skip to main content


This content will become publicly available on May 31, 2024

Title: On Trust-aware Assistance-seeking in Human-Supervised Autonomy
Using the context of human-supervised object collection tasks, we explore policies for a robot to seek assistance from a human supervisor and avoid loss of human trust in the robot. We consider a human-robot interaction scenario in which a mobile manipulator chooses to collect objects either autonomously or through human assistance; while the human supervisor monitors the robot’s operation, assists when asked, or intervenes if the human perceives that the robot may not accomplish its goal. We design an optimal assistance-seeking policy for the robot using a Partially Observable Markov Decision Process (POMDP) setting in which human trust is a hidden state and the objective is to maximize collaborative performance. We conduct two sets of human-robot interaction experiments. The data from the first set of experiments is used to estimate POMDP parameters, which are used to compute an optimal assistance-seeking policy that is used in the second experiment. For most participants, the estimated POMDP reveals that humans are more likely to intervene when their trust is low and the robot is performing a high-complexity task; and that the robot asking for assistance in high-complexity tasks can increase human trust in the robot. Our experimental results show that the proposed trust-aware policy yields superior performance compared with an optimal trust-agnostic policy.  more » « less
Award ID(s):
2024649
NSF-PAR ID:
10445563
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
American Control Conference
Page Range / eLocation ID:
3901 to 3906
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy. A known problem with this “off-policy” approach is that the robot’s errors compound when drifting away from the supervisor’s demonstrations. On-policy, techniques alleviate this by iteratively collecting corrective actions for the current robot policy. However, these techniques can be tedious for human supervisors, add significant computation burden, and may visit dangerous states during training. We propose an off-policy approach that injects noise into the supervisor’s policy while demonstrating. This forces the supervisor to demonstrate how to recover from errors. We propose a new algorithm, DART (Disturbances for Augmenting Robot Trajectories), that collects demonstrations with injected noise, and optimizes the noise level to approximate the error of the robot’s trained policy during data collection. We compare DART with DAgger and Behavior Cloning in two domains: in simulation with an algorithmic supervisor on the MuJoCo tasks (Walker, Humanoid, Hopper, Half-Cheetah) and in physical experiments with human supervisors training a Toyota HSR robot to perform grasping in clutter. For high dimensional tasks like Humanoid, DART can be up to 3x faster in computation time and only decreases the supervisor’s cumulative reward by 5% during training, whereas DAgger executes policies that have 80% less cumulative reward than the supervisor. On the grasping in clutter task, DART obtains on average a 62% performance increase over Behavior Cloning. 
    more » « less
  2. Recent work has considered personalized route planning based on user profiles, but none of it accounts for human trust. We argue that human trust is an important factor to consider when planning routes for automated vehicles. This article presents a trust-based route-planning approach for automated vehicles. We formalize the human-vehicle interaction as a partially observable Markov decision process (POMDP) and model trust as a partially observable state variable of the POMDP, representing the human’s hidden mental state. We build data-driven models of human trust dynamics and takeover decisions, which are incorporated in the POMDP framework, using data collected from an online user study with 100 participants on the Amazon Mechanical Turk platform. We compute optimal routes for automated vehicles by solving optimal policies in the POMDP planning and evaluate the resulting routes via human subject experiments with 22 participants on a driving simulator. The experimental results show that participants taking the trust-based route generally reported more positive responses in the after-driving survey than those taking the baseline (trust-free) route. In addition, we analyze the trade-offs between multiple planning objectives (e.g., trust, distance, energy consumption) via multi-objective optimization of the POMDP. We also identify a set of open issues and implications for real-world deployment of the proposed approach in automated vehicles. 
    more » « less
  3. This paper presents a framework to learn the reward function underlying high-level sequential tasks from demonstrations. The purpose of reward learning, in the context of learning from demonstration (LfD), is to generate policies that mimic the demonstrator’s policies, thereby enabling imitation learning. We focus on a human-robot interaction(HRI) domain where the goal is to learn and model structured interactions between a human and a robot. Such interactions can be modeled as a partially observable Markov decision process (POMDP) where the partial observability is caused by uncertainties associated with the ways humans respond to different stimuli. The key challenge in finding a good policy in such a POMDP is determining the reward function that was observed by the demonstrator. Existing inverse reinforcement learning(IRL) methods for POMDPs are computationally very expensive and the problem is not well understood. In comparison, IRL algorithms for Markov decision process (MDP) are well defined and computationally efficient. We propose an approach of reward function learning for high-level sequential tasks from human demonstrations where the core idea is to reduce the underlying POMDP to an MDP and apply any efficient MDP-IRL algorithm. Our extensive experiments suggest that the reward function learned this way generates POMDP policies that mimic the policies of the demonstrator well. 
    more » « less
  4. Healthy human locomotion functions with good gait symmetry depend on rhythmic coordination of the left and right legs, which can be deteriorated by neurological disorders like stroke and spinal cord injury. Powered exoskeletons are promising devices to improve impaired people's locomotion functions, like gait symmetry. However, given higher uncertainties and the time-varying nature of human-robot interaction, providing personalized robotic assistance from exoskeletons to achieve the best gait symmetry is challenging, especially for people with neurological disorders. In this paper, we propose a hierarchical control framework for a bilateral hip exoskeleton to provide the adaptive optimal hip joint assistance with a control objective of imposing the desired gait symmetry during walking. Three control levels are included in the hierarchical framework, including the high-level control to tune three control parameters based on a policy iteration reinforcement learning approach, the middle-level control to define the desired assistive torque profile based on a delayed output feedback control method, and the low-level control to achieve a good torque trajectory tracking performance. To evaluate the feasibility of the proposed control framework, five healthy young participants are recruited for treadmill walking experiments, where an artificial gait asymmetry is imitated as the hemiparesis post-stroke, and only the ‘paretic’ hip joint is controlled with the proposed framework. The pilot experimental studies demonstrate that the hierarchical control framework for the hip exoskeleton successfully (asymmetry index from 8.8% to − 0.5%) and efficiently (less than 4 minutes) achieved the desired gait symmetry by providing adaptive optimal assistance on the ‘paretic’ hip joint. 
    more » « less
  5. Selecting appropriate tutoring help actions that account for both a student’s content mastery and engagement level is essential for effective human tutors, indicating the critical need for these skills in autonomous tutors. In this work, we formulate the robot-student tutoring help action selection problem as the Assistive Tutor partially observable Markov decision process (AT-POMDP). We designed the AT-POMDP and derived its parameters based on data from a prior robot-student tutoring study. The policy that results from solving the ATPOMDP allows a robot tutor to decide upon the optimal tutoring help action to give a student, while maintaining a belief of the student’s mastery of the material and engagement with the task. This approach is validated through a between-subjects field study, which involved 4th grade students (n = 28) interacting with a social robot solving long division problems over five sessions. Students who received help from a robot using the AT-POMDP policy demonstrated significantly greater learning gains than students who received help from a robot with a fixed help action selection policy. Our results demonstrate that this robust computational framework can be used effectively to deliver diverse and personalized tutoring support over time for students. 
    more » « less