skip to main content


Title: Learning a Decentralized Multi-arm Motion Planner
We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robotic systems have relied on centralized motion planners, whose run times often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a shared policy network is trained to control each individual robot arm to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multiarm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets.  more » « less
Award ID(s):
2037101
PAR ID:
10311134
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2020 Conference on Robot Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents an online, robust, and model-free motion planning framework for kinodynamic systems. In particular, we employ a Q-learning algorithm for a two player zero-sum dynamic game to account for worst-case disturbances and kinodynamic constraints. We use one critic, and two actor approximators to solve online the finite horizon minimax problem with a form of integral reinforcement learning. We then leverage a terminal state evaluation structure to facilitate the online implementation. A static obstacle augmentation, and a local replanning framework is presented to guarantee safe kinodynamic motion planning. Rigorous Lyapunov-based proofs are provided to guarantee closed-loop stability, while maintaining robustness and optimality. We finally evaluate the efficacy of the proposed framework with simulations and we provide a qualitative comparison of kinodynamic motion planning techniques 
    more » « less
  2. NA (Ed.)
    In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called end-effector. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator’s reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator’s end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes. 
    more » « less
  3. null (Ed.)
    This article presents the design process of a supernumerary wearable robotic forearm (WRF), along with methods for stabilizing the robot’s end-effector using human motion prediction. The device acts as a lightweight “third arm” for the user, extending their reach during handovers and manipulation in close-range collaborative activities. It was developed iteratively, following a user-centered design process that included an online survey, contextual inquiry, and an in-person usability study. Simulations show that the WRF significantly enhances a wearer’s reachable workspace volume, while remaining within biomechanical ergonomic load limits during typical usage scenarios. While operating the device in such scenarios, the user introduces disturbances in its pose due to their body movements. We present two methods to overcome these disturbances: autoregressive (AR) time series and a recurrent neural network (RNN). These models were used for forecasting the wearer’s body movements to compensate for disturbances, with prediction horizons determined through linear system identification. The models were trained offline on a subset of the KIT Human Motion Database, and tested in five usage scenarios to keep the 3D pose of the WRF’s end-effector static. The addition of the predictive models reduced the end-effector position errors by up to 26% compared to direct feedback control. 
    more » « less
  4. null (Ed.)
    Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With a nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control. 
    more » « less
  5. This article presents a new decentralized multi-agent information-theoretic (DeMAIT) control algorithm for mobile sensors (agents). The algorithm leverages Bayesian estimation and information-theoretic motion planning for efficient and effective estimation and localization of a target, such as a chemical gas leak. The algorithm consists of: (1) a non-parametric Bayesian estimator, (2) an information-theoretic trajectory planner that generates “informative trajectories” for agents to follow, and (3) a controller and collision avoidance algorithm to ensure that each agent follows its trajectory as closely as possible in a safe manner. Advances include the use of a new information-gain metric and its analytical gradient, which do not depend on an infinite series like prior information metrics. Dynamic programming and multi-threading techniques are applied to efficiently compute the mutual information to minimize measurement uncertainty. The estimation and motion planning processes also take into account the dynamics of the sensors and agents. Extensive simulations are conducted to compare the performance between the DeMAIT algorithm to a traditional raster-scanning method and a clustering method with coordination. The main hypothesis that the DeMAIT algorithm outperforms the other two methods is validated, specifically where the average localization success rate for the DeMAIT algorithm is (a) higher and (b) more robust to changes in the source location, robot team size, and search area size than the raster-scanning and clustering methods. Finally, outdoor field experiments are conducted using a team of custom-built aerial robots equipped with gas concentration sensors to demonstrate efficacy of the DeMAIT algorithm to estimate and find the source of a propane gas leak.

     
    more » « less