skip to main content


Title: AMBF-RL: A real-time simulation based Reinforcement Learning toolkit for Medical Robotics
Recently, Reinforcement Learning (RL) techniques have seen significant progress in the robotics domain. This can be attributed to robust simulation frameworks that offer realistic environments to train. However, there is a lack of platforms which offer environments that are conducive to medical robotic tasks. Having earlier designed the Asynchronous Multibody Framework (AMBF) - a real-time dynamics simulator well-suited for medical robotics tasks, we propose an open source AMBF-RL (ARL) toolkit to assist in designing control algorithms for these robots, as well as a module to collect and parse expert demonstration data. We validate ARL by attempting to partially automate the task of debris removal on the da Vinci Research Kit (dVRK) Patient Side Manipulator (PSM) in simulation by calculating the optimal policy using both Deep Deterministic Policy Gradient (DDPG) and Hindsight Experience Replay (HER) with DDPG. The trained policies are successfully transferred onto the physical dVRK PSM and tested. Finally, we draw a conclusion from the results and discuss our observations of the experiments conducted.  more » « less
Award ID(s):
1927275 1637759
NSF-PAR ID:
10355293
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 International Symposium on Medical Robotics (ISMR)
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Flooding in many areas is becoming more prevalent due to factors such as urbanization and climate change, requiring modernization of stormwater infrastructure. Retrofitting standard passive systems with controllable valves/pumps is promising, but requires real-time control (RTC). One method of automating RTC is reinforcement learning (RL), a general technique for sequential optimization and control in uncertain environments. The notion is that an RL algorithm can use inputs of real-time flood data and rainfall forecasts to learn a policy for controlling the stormwater infrastructure to minimize measures of flooding. In real-world conditions, rainfall forecasts and other state information are subject to noise and uncertainty. To account for these characteristics of the problem data, we implemented Deep Deterministic Policy Gradient (DDPG), an RL algorithm that is distinguished by its capability to handle noise in the input data. DDPG implementations were trained and tested against a passive flood control policy. Three primary cases were studied: (i) perfect data, (ii) imperfect rainfall forecasts, and (iii) imperfect water level and forecast data. Rainfall episodes (100) that caused flooding in the passive system were selected from 10 years of observations in Norfolk, Virginia, USA; 85 randomly selected episodes were used for training and the remaining 15 unseen episodes served as test cases. Compared to the passive system, all RL implementations reduced flooding volume by 70.5% on average, and performed within a range of 5%. This suggests that DDPG is robust to noisy input data, which is essential knowledge to advance the real-world applicability of RL for stormwater RTC. 
    more » « less
  2. Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm with the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at https://mihdalal.github.io/raps/ 
    more » « less
  3. null (Ed.)
    Robot-assisted minimally invasive surgery has made a substantial impact in operating rooms over the past few decades with their high dexterity, small tool size, and impact on adoption of minimally invasive techniques. In recent years, intelligence and different levels of surgical robot autonomy have emerged thanks to the medical robotics endeavors at numerous academic institutions and leading surgical robot companies. To accelerate interaction within the research community and prevent repeated development, we propose the Collaborative Robotics Toolkit (CRTK), a common API for the RAVEN-II and da Vinci Research Kit (dVRK) - two open surgical robot platforms installed at more than 40 institutions worldwide. CRTK has broadened to include other robots and devices, including simulated robotic systems and industrial robots. This common API is a community software infrastructure for research and education in cutting edge human-robot collaborative areas such as semi-autonomous teleoperation and medical robotics. This paper presents the concepts, design details and the integration of CRTK with physical robot systems and simulation platforms. 
    more » « less
  4. null (Ed.)
    Robot-assisted minimally invasive surgery has made a substantial impact in operating rooms over the past few decades with their high dexterity, small tool size, and impact on adoption of minimally invasive techniques. In recent years, intelligence and different levels of surgical robot autonomy have emerged thanks to the medical robotics endeavors at numerous academic institutions and leading surgical robot companies. To accelerate interaction within the research community and prevent repeated development, we propose the Collaborative Robotics Toolkit (CRTK), a common API for the RAVEN-II and da Vinci Research Kit (dVRK) - two open surgical robot platforms installed at more than 40 institutions worldwide. CRTK has broadened to include other robots and devices, including simulated robotic systems and industrial robots. This common API is a community software infrastructure for research and education in cutting edge human-robot collaborative areas such as semi-autonomous teleoperation and medical robotics. This paper presents the concepts, design details and the integration of CRTK with physical robot systems and simulation platforms. 
    more » « less
  5. Reinforcement learning (RL) in low-data and risk-sensitive domains requires performant and flexible deployment policies that can readily incorporate constraints during deployment. One such class of policies are the semi-parametric H-step lookahead policies, which select actions using trajectory optimization over a dynamics model for a fixed horizon with a terminal value function. In this work, we investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function learned by a model-free off-policy algorithm, named Learning Off-Policy with Online Planning (LOOP). We provide a theoretical analysis of this method, suggesting a tradeoff between model errors and value function errors and empirically demonstrate this tradeoff to be beneficial in deep reinforcement learning. Furthermore, we identify the "Actor Divergence" issue in this framework and propose Actor Regularized Control (ARC), a modified trajectory optimization procedure. We evaluate our method on a set of robotic tasks for Offline and Online RL and demonstrate improved performance. We also show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments. We demonstrate that LOOP is a desirable framework for robotics applications based on its strong performance in various important RL settings. 
    more » « less