skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mutual reinforcement learning with robot trainers
The researchers in this study have developed a novel approach using mutual reinforcement learning (MRL) where both the robot and human act as empathetic individuals who function as reinforcement learning agents for each other to achieve a particular task over continuous communication and feedback. This shared model not only has a collective impact but improves human cognition and helps in building a successful human-robot relationship. In our current work, we compared our learned reinforcement model with a baseline non-reinforcement and random approach in a robotics domain to identify the significance and impact of MRL. MRL contributed to improved skill transfer, and the robot was able successfully to predict which reinforcement behaviors would be most valuable to its human partners.  more » « less
Award ID(s):
1659645
PAR ID:
10173021
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 2019 ACM Conference on Human-Robot Interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ObjectiveThis study aims to improve workers’ postures and thus reduce the risk of musculoskeletal disorders in human-robot collaboration by developing a novel model-free reinforcement learning method. BackgroundHuman-robot collaboration has been a flourishing work configuration in recent years. Yet, it could lead to work-related musculoskeletal disorders if the collaborative tasks result in awkward postures for workers. MethodsThe proposed approach follows two steps: first, a 3D human skeleton reconstruction method was adopted to calculate workers’ continuous awkward posture (CAP) score; second, an online gradient-based reinforcement learning algorithm was designed to dynamically improve workers’ CAP score by adjusting the positions and orientations of the robot end effector. ResultsIn an empirical experiment, the proposed approach can significantly improve the CAP scores of the participants during a human-robot collaboration task when compared with the scenarios where robot and participants worked together at a fixed position or at the individual elbow height. The questionnaire outcomes also showed that the working posture resulted from the proposed approach was preferred by the participants. ConclusionThe proposed model-free reinforcement learning method can learn the optimal worker postures without the need for specific biomechanical models. The data-driven nature of this method can make it adaptive to provide personalized optimal work posture. ApplicationThe proposed method can be applied to improve the occupational safety in robot-implemented factories. Specifically, the personalized robot working positions and orientations can proactively reduce exposure to awkward postures that increase the risk of musculoskeletal disorders. The algorithm can also reactively protect workers by reducing the workload in specific joints. 
    more » « less
  2. In this work, we propose a method to generate reduced-order model reference trajectories for general classes of highly dynamic maneuvers for bipedal robots for use in sim-to-real reinforcement learning. Our approach is to utilize a single rigid-body model (SRBM) to optimize libraries of trajectories offline to be used as expert references that guide learning by regularizing behaviors when incorporated in the reward function of a learned policy. This method translates the model's dynamically rich rotational and translational behavior to a full-order robot model and successfully transfers to real hardware. The SRBM's simplicity allows for fast iteration and refinement of behaviors, while the robustness of learning-based controllers allows for highly dynamic motions to be transferred to hardware. Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for a variety of highly dynamic maneuvers as well as our approach to integrating reference trajectories for a high-speed running reinforcement learning policy. We validate our methods on the bipedal robot Cassie on which we were successfully able to demonstrate highly dynamic grounded running gaits up to 3.0 m/s. 
    more » « less
  3. Pedestrian regulation can prevent crowd accidents and improve crowd safety in densely populated areas. Recent studies use mobile robots to regulate pedestrian flows for desired collective motion through the effect of passive human-robot interaction (HRI). This paper formulates a robot motion planning problem for the optimization of two merging pedestrian flows moving through a bottleneck exit. To address the challenge of feature representation of complex human motion dynamics under the effect of HRI, we propose using a deep neural network to model the mapping from the image input of pedestrian environments to the output of robot motion decisions. The robot motion planner is trained end-to-end using a deep reinforcement learning algorithm, which avoids hand-crafted feature detection and extraction, thus improving the learning capability for complex dynamic problems. Our proposed approach is validated in simulated experiments, and its performance is evaluated. The results demonstrate that the robot is able to find optimal motion decisions that maximize the pedestrian outflow in different flow conditions, and the pedestrian-accumulated outflow increases significantly compared to cases without robot regulation and with random robot motion. 
    more » « less
  4. Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These policies reason over hundreds of fine-grained actions that the robot arm needs to take: e.g., moving slightly to the right or rotating the end-effector a few degrees. But the manipulation tasks that we want robots to perform can often be broken down into a small number of high-level motions: e.g., reaching an object or turning a handle. In this paper we therefore propose a waypoint-based approach for model-free reinforcement learning. Instead of learning a low-level policy, the robot now learns a trajectory of waypoints, and then interpolates between those waypoints using existing controllers. Our key novelty is framing this waypoint-based setting as a sequence of multi-armed bandits: each bandit problem corresponds to one waypoint along the robot’s motion. We theoretically show that an ideal solution to this reformulation has lower regret bounds than standard frameworks. We also introduce an approximate posterior sampling solution that builds the robot’s motion one waypoint at a time. Results across benchmark simulations and two real-world experiments suggest that this proposed approach learns new tasks more quickly than state-of-the-art baselines. See our website here: https://collab.me.vt.edu/rl-waypoints/ 
    more » « less
  5. In public spaces shared with humans, ensuring multi-robot systems navigate without collisions while respecting social norms is challenging, particularly with limited communication. Although current robot social navigation techniques leverage advances in reinforcement learning and deep learning, they frequently overlook robot dynamics in simulations, leading to a simulation-to-reality gap. In this paper, we bridge this gap by presenting a new multi-robot social navigation environment crafted using Dec-POSMDP and multi-agent reinforcement learning. Furthermore, we introduce SAMARL: a novel benchmark for cooperative multi-robot social navigation. SAMARL employs a unique spatial-temporal transformer combined with multi-agent reinforcement learning. This approach effectively captures the complex interactions between robots and humans, thus promoting cooperative tendencies in multi-robot systems. Our extensive experiments reveal that SAMARL outperforms existing baseline and ablation models in our designed environment. Demo videos for this work can be found at: https://sites.google.com/view/samarl 
    more » « less