skip to main content

Title: Deep reinforcement learning for shared control of mobile robots

Shared control of mobile robots integrates manual input with auxiliary autonomous controllers to improve the overall system performance. However, prior work that seeks to find the optimal shared control ratio needs an accurate human model, which is usually challenging to obtain. In this study, the authors develop an extended Twin Delayed Deep Deterministic Policy Gradient (DDPG) (TD3X)‐based shared control framework that learns to assist a human operator in teleoperating mobile robots optimally. The robot's states, shared control ratio in the previous time step, and human's control input is used as inputs to the reinforcement learning (RL) agent, which then outputs the optimal shared control ratio between human input and autonomous controllers without knowing the human model. Noisy softmax policies are developed to make the TD3X algorithm feasible under the constraint of a shared control ratio. Furthermore, to accelerate the training process and protect the robot, a navigation demonstration policy and a safety guard are developed. A neural network (NN) structure is developed to maintain the correlation of sensor readings among heterogeneous input data and improve the learning speed. In addition, an extended DAGGER (DAGGERX) human agent is developed for training the RL agent to reduce human workload. Robot simulations and experiments with humans in the loop are conducted. The results show that the DAGGERX human agent can simulate real human inputs in the worst‐case scenarios with a mean square error of 0.0039. Compared to the original TD3 agent, the TD3X‐based shared control system decreased the average collision number from 387.3 to 44.4 in a simplistic environment and 394.2 to 171.2 in a more complex environment. The maximum average return increased from 1043 to 1187 with a faster converge speed in the simplistic environment, while the performance is equally good in the complex environment because of the use of an advanced human agent. In the human subject tests, participants' average perceived workload was significantly lower in shared control than that in exclusively manual control (26.90 vs. 40.07,p = 0.013).

more » « less
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
DOI PREFIX: 10.1049
Date Published:
Journal Name:
IET Cyber-Systems and Robotics
Page Range / eLocation ID:
p. 315-330
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Motion tracking interfaces are intuitive for free-form teleoperation tasks. However, efficient manipulation control can be difficult with such interfaces because of issues like the interference of unintended motions and the limited precision of human motion control. The limitation in control efficiency reduces the operator's performance and increases their workload and frustration during robot teleoperation. To improve the efficiency, we proposed separating controlled degrees of freedom (DoFs) and adjusting the motion scaling ratio of a motion tracking interface. The motion tracking of handheld controllers from a Virtual Reality system was used for the interface. We separated the translation and rotational control into: 1) two controllers held in the dominant and non-dominant hands and 2) hand pose tracking and trackpad inputs of a controller. We scaled the control mapping ratio based on 1) the environmental constraints and 2) the teleoperator's control speed. We further conducted a user study to investigate the effectiveness of the proposed methods in increasing efficiency. Our results show that the separation of position and orientation control into two controllers and the environment-based scaling methods perform better than their alternatives. 
    more » « less
  2. ABSTRACT Introduction

    Remote military operations require rapid response times for effective relief and critical care. Yet, the military theater is under austere conditions, so communication links are unreliable and subject to physical and virtual attacks and degradation at unpredictable times. Immediate medical care at these austere locations requires semi-autonomous teleoperated systems, which enable the completion of medical procedures even under interrupted networks while isolating the medics from the dangers of the battlefield. However, to achieve autonomy for complex surgical and critical care procedures, robots require extensive programming or massive libraries of surgical skill demonstrations to learn effective policies using machine learning algorithms. Although such datasets are achievable for simple tasks, providing a large number of demonstrations for surgical maneuvers is not practical. This article presents a method for learning from demonstration, combining knowledge from demonstrations to eliminate reward shaping in reinforcement learning (RL). In addition to reducing the data required for training, the self-supervised nature of RL, in conjunction with expert knowledge-driven rewards, produces more generalizable policies tolerant to dynamic environment changes. A multimodal representation for interaction enables learning complex contact-rich surgical maneuvers. The effectiveness of the approach is shown using the cricothyroidotomy task, as it is a standard procedure seen in critical care to open the airway. In addition, we also provide a method for segmenting the teleoperator’s demonstration into subtasks and classifying the subtasks using sequence modeling.

    Materials and Methods

    A database of demonstrations for the cricothyroidotomy task was collected, comprising six fundamental maneuvers referred to as surgemes. The dataset was collected by teleoperating a collaborative robotic platform—SuperBaxter, with modified surgical grippers. Then, two learning models are developed for processing the dataset—one for automatic segmentation of the task demonstrations into a sequence of surgemes and the second for classifying each segment into labeled surgemes. Finally, a multimodal off-policy RL with rewards learned from demonstrations was developed to learn the surgeme execution from these demonstrations.


    The task segmentation model has an accuracy of 98.2%. The surgeme classification model using the proposed interaction features achieved a classification accuracy of 96.25% averaged across all surgemes compared to 87.08% without these features and 85.4% using a support vector machine classifier. Finally, the robot execution achieved a task success rate of 93.5% compared to baselines of behavioral cloning (78.3%) and a twin-delayed deep deterministic policy gradient with shaped rewards (82.6%).


    Results indicate that the proposed interaction features for the segmentation and classification of surgical tasks improve classification accuracy. The proposed method for learning surgemes from demonstrations exceeds popular methods for skill learning. The effectiveness of the proposed approach demonstrates the potential for future remote telemedicine on battlefields.

    more » « less
  3. null (Ed.)
    The recent development of Robot-Assisted Minimally Invasive Surgery (RAMIS) has brought much benefit to ease the performance of complex Minimally Invasive Surgery (MIS) tasks and lead to more clinical outcomes. Compared to direct master-slave manipulation, semi-autonomous control for the surgical robot can enhance the efficiency of the operation, particularly for repetitive tasks. However, operating in a highly dynamic in-vivo environment is complex. Supervisory control functions should be included to ensure flexibility and safety during the autonomous control phase. This paper presents a haptic rendering interface to enable supervised semi-autonomous control for a surgical robot. Bayesian optimization is used to tune user-specific parameters during the surgical training process. User studies were conducted on a customized simulator for validation. Detailed comparisons are made between with and without the supervised semi-autonomous control mode in terms of the number of clutching events, task completion time, master robot end-effector trajectory and average control speed of the slave robot. The effectiveness of the Bayesian optimization is also evaluated, demonstrating that the optimized parameters can significantly improve users' performance. Results indicate that the proposed control method can reduce the operator's workload and enhance operation efficiency. 
    more » « less
  4. null (Ed.)
    Learning policies in simulation is promising for reducing human effort when training robot controllers. This is especially true for soft robots that are more adaptive and safe but also more difficult to accurately model and control. The sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. System identification can be applied to reduce this gap but traditional identification methods require a lot of manual tuning. Data-driven alternatives can tune dynamical models directly from data but are often data hungry, which also incorporates human effort in collecting data. This work proposes a data-driven, end-to-end differentiable simulator focused on the exciting but challenging domain of tensegrity robots. To the best of the authors’ knowledge, this is the first differentiable physics engine for tensegrity robots that supports cable, contact, and actuation modeling. The aim is to develop a reasonably simplified, data-driven simulation, which can learn approximate dynamics with limited ground truth data. The dynamics must be accurate enough to generate policies that can be transferred back to the ground-truth system. As a first step in this direction, the current work demonstrates sim2sim transfer, where the unknown physical model of MuJoCo acts as a ground truth system. Two different tensegrity robots are used for evaluation and learning of locomotion policies, a 6-bar and a 3-bar tensegrity. The results indicate that only 0.25% of ground truth data are needed to train a policy that works on the ground truth system when the differentiable engine is used for training against training the policy directly on the ground truth system. 
    more » « less
  5. This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combinatorial nature and train the agent using the RL environment developed by RTE, the French TSO. To address the challenge of the large optimization space, a curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using network physics for enhanced agent learning. Further, a parallel training approach on multiple scenarios is employed to avoid biasing the agent to a few scenarios and make it robust to the natural variability in grid operations. Without these modifications to the training procedure, the RL agent failed for most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL learning. The agent was tested by RTE for the 2019 learning to run the power network challenge and was awarded the 2nd place in accuracy and 1st place in speed. The developed code is open-sourced for public use. Analysis of a simple system proves the enhancement in training RL-agents using the curriculum. 
    more » « less