skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Mutual reinforcement learning with robot trainers
The researchers in this study have developed a novel approach using mutual reinforcement learning (MRL) where both the robot and human act as empathetic individuals who function as reinforcement learning agents for each other to achieve a particular task over continuous communication and feedback. This shared model not only has a collective impact but improves human cognition and helps in building a successful human-robot relationship. In our current work, we compared our learned reinforcement model with a baseline non-reinforcement and random approach in a robotics domain to identify the significance and impact of MRL. MRL contributed to improved skill transfer, and the robot was able successfully to predict which reinforcement behaviors would be most valuable to its human partners.  more » « less
Award ID(s):
1659645
PAR ID:
10173021
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 2019 ACM Conference on Human-Robot Interaction
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ObjectiveThis study aims to improve workers’ postures and thus reduce the risk of musculoskeletal disorders in human-robot collaboration by developing a novel model-free reinforcement learning method. BackgroundHuman-robot collaboration has been a flourishing work configuration in recent years. Yet, it could lead to work-related musculoskeletal disorders if the collaborative tasks result in awkward postures for workers. MethodsThe proposed approach follows two steps: first, a 3D human skeleton reconstruction method was adopted to calculate workers’ continuous awkward posture (CAP) score; second, an online gradient-based reinforcement learning algorithm was designed to dynamically improve workers’ CAP score by adjusting the positions and orientations of the robot end effector. ResultsIn an empirical experiment, the proposed approach can significantly improve the CAP scores of the participants during a human-robot collaboration task when compared with the scenarios where robot and participants worked together at a fixed position or at the individual elbow height. The questionnaire outcomes also showed that the working posture resulted from the proposed approach was preferred by the participants. ConclusionThe proposed model-free reinforcement learning method can learn the optimal worker postures without the need for specific biomechanical models. The data-driven nature of this method can make it adaptive to provide personalized optimal work posture. ApplicationThe proposed method can be applied to improve the occupational safety in robot-implemented factories. Specifically, the personalized robot working positions and orientations can proactively reduce exposure to awkward postures that increase the risk of musculoskeletal disorders. The algorithm can also reactively protect workers by reducing the workload in specific joints. 
    more » « less
  2. In this work, we propose a method to generate reduced-order model reference trajectories for general classes of highly dynamic maneuvers for bipedal robots for use in sim-to-real reinforcement learning. Our approach is to utilize a single rigid-body model (SRBM) to optimize libraries of trajectories offline to be used as expert references that guide learning by regularizing behaviors when incorporated in the reward function of a learned policy. This method translates the model's dynamically rich rotational and translational behavior to a full-order robot model and successfully transfers to real hardware. The SRBM's simplicity allows for fast iteration and refinement of behaviors, while the robustness of learning-based controllers allows for highly dynamic motions to be transferred to hardware. Within this work we introduce a set of transferability constraints that amend the SRBM dynamics to actual bipedal robot hardware, our framework for creating optimal trajectories for a variety of highly dynamic maneuvers as well as our approach to integrating reference trajectories for a high-speed running reinforcement learning policy. We validate our methods on the bipedal robot Cassie on which we were successfully able to demonstrate highly dynamic grounded running gaits up to 3.0 m/s. 
    more » « less
  3. Pedestrian regulation can prevent crowd accidents and improve crowd safety in densely populated areas. Recent studies use mobile robots to regulate pedestrian flows for desired collective motion through the effect of passive human-robot interaction (HRI). This paper formulates a robot motion planning problem for the optimization of two merging pedestrian flows moving through a bottleneck exit. To address the challenge of feature representation of complex human motion dynamics under the effect of HRI, we propose using a deep neural network to model the mapping from the image input of pedestrian environments to the output of robot motion decisions. The robot motion planner is trained end-to-end using a deep reinforcement learning algorithm, which avoids hand-crafted feature detection and extraction, thus improving the learning capability for complex dynamic problems. Our proposed approach is validated in simulated experiments, and its performance is evaluated. The results demonstrate that the robot is able to find optimal motion decisions that maximize the pedestrian outflow in different flow conditions, and the pedestrian-accumulated outflow increases significantly compared to cases without robot regulation and with random robot motion. 
    more » « less
  4. Robot arms should be able to learn new tasks. One framework here is reinforcement learning, where the robot is given a reward function that encodes the task, and the robot autonomously learns actions to maximize its reward. Existing approaches to reinforcement learning often frame this problem as a Markov decision process, and learn a policy (or a hierarchy of policies) to complete the task. These policies reason over hundreds of fine-grained actions that the robot arm needs to take: e.g., moving slightly to the right or rotating the end-effector a few degrees. But the manipulation tasks that we want robots to perform can often be broken down into a small number of high-level motions: e.g., reaching an object or turning a handle. In this paper we therefore propose a waypoint-based approach for model-free reinforcement learning. Instead of learning a low-level policy, the robot now learns a trajectory of waypoints, and then interpolates between those waypoints using existing controllers. Our key novelty is framing this waypoint-based setting as a sequence of multi-armed bandits: each bandit problem corresponds to one waypoint along the robot’s motion. We theoretically show that an ideal solution to this reformulation has lower regret bounds than standard frameworks. We also introduce an approximate posterior sampling solution that builds the robot’s motion one waypoint at a time. Results across benchmark simulations and two real-world experiments suggest that this proposed approach learns new tasks more quickly than state-of-the-art baselines. See our website here: https://collab.me.vt.edu/rl-waypoints/ 
    more » « less
  5. Robot-assisted dressing could profoundly enhance the quality of life of adults with physical disabilities. To achieve this, a robot can benefit from both visual and force sensing. The former enables the robot to ascertain human body pose and garment deformations, while the latter helps maintain safety and comfort during the dressing process. In this paper, we introduce a new technique that leverages both vision and force modalities for this assistive task. Our approach first trains a vision-based dressing policy using reinforcement learning in simulation with varying body sizes, poses, and types of garments. We then learn a force dynamics model for action planning to ensure safety. Due to limitations of simulating accurate force data when deformable garments interact with the human body, we learn a force dynamics model directly from real-world data. Our proposed method combines the vision-based policy, trained in simulation, with the force dynamics model, learned in the real world, by solving a constrained optimization problem to infer actions that facilitate the dressing process without applying excessive force on the person. We evaluate our system in simulation and in a real-world human study with 10 participants across 240 dressing trials, showing it greatly outperforms prior baselines. 
    more » « less