skip to main content


Title: Learning a Decentralized Multi-arm Motion Planner
We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robotic systems have relied on centralized motion planners, whose run times often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a shared policy network is trained to control each individual robot arm to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multiarm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets.  more » « less
Award ID(s):
2037101
NSF-PAR ID:
10311134
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2020 Conference on Robot Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Smooth camber morphing aircraft offer increased control authority and improved aerodynamic efficiency. Smart material actuators have become a popular driving force for shape changes, capable of adhering to weight and size constraints and allowing for simplicity in mechanical design. As a step towards creating uncrewed aerial vehicles (UAVs) capable of autonomously responding to flow conditions, this work examines a multifunctional morphing airfoil’s ability to follow commands in various flows. We integrated an airfoil with a morphing trailing edge consisting of an antagonistic pair of macro fiber composites (MFCs), serving as both skin and actuator, and internal piezoelectric flex sensors to form a closed loop composite system. Closed loop feedback control is necessary to accurately follow deflection commands due to the hysteretic behavior of MFCs. Here we used a deep reinforcement learning algorithm, Proximal Policy Optimization, to control the morphing airfoil. Two neural controllers were trained in a simulation developed through time series modeling on long short-term memory recurrent neural networks. The learned controllers were then tested on the composite wing using two state inference methods in still air and in a wind tunnel at various flow speeds. We compared the performance of our neural controllers to one using traditional position-derivative feedback control methods. Our experimental results validate that the autonomous neural controllers were faster and more accurate than traditional methods. This research shows that deep learning methods can overcome common obstacles for achieving sufficient modeling and control when implementing smart composite actuators in an autonomous aerospace environment.

     
    more » « less
  2. Dynamic network topology can pose important challenges to communication and control protocols in networks of autonomous vehicles. For instance, maintaining connectivity is a key challenge in unmanned aerial vehicle (UAV) networks. However, tracking and computational resources of the observer module might not be sufficient for constant monitoring of all surrounding nodes in large-scale networks. In this paper, we propose an optimal measurement policy for network topology monitoring under constrained resources. To this end, We formulate the localization of multiple objects in terms of linear networked systems and solve it using Kalman filtering with intermittent observation. The proposed policy includes two sequential steps. We first find optimal measurement attempt probabilities for each target using numerical optimization methods to assign the limited number of resources among targets. The optimal resource allocation follows a waterfall-like solution to assign more resources to targets with lower measurement success probability. This provides a 10% to 60% gain in prediction accuracy. The second step is finding optimal on-off patterns for measurement attempts for each target over time. We show that a regular measurement pattern that evenly distributed resources over time outperforms the two extreme cases of using all measurement resources either in the beginning or at the end of the measurement cycle. Our proof is based on characterizing the fixed-point solution of the error covariance matrix for regular patterns. Extensive simulation results confirm the optimality of the most alternating pattern with up to 10-fold prediction improvement for different scenarios. These two guidelines define a general policy for target tracking under constrained resources with applications to network topology prediction of autonomous systems 
    more » « less
  3. null (Ed.)
    This article presents the design process of a supernumerary wearable robotic forearm (WRF), along with methods for stabilizing the robot’s end-effector using human motion prediction. The device acts as a lightweight “third arm” for the user, extending their reach during handovers and manipulation in close-range collaborative activities. It was developed iteratively, following a user-centered design process that included an online survey, contextual inquiry, and an in-person usability study. Simulations show that the WRF significantly enhances a wearer’s reachable workspace volume, while remaining within biomechanical ergonomic load limits during typical usage scenarios. While operating the device in such scenarios, the user introduces disturbances in its pose due to their body movements. We present two methods to overcome these disturbances: autoregressive (AR) time series and a recurrent neural network (RNN). These models were used for forecasting the wearer’s body movements to compensate for disturbances, with prediction horizons determined through linear system identification. The models were trained offline on a subset of the KIT Human Motion Database, and tested in five usage scenarios to keep the 3D pose of the WRF’s end-effector static. The addition of the predictive models reduced the end-effector position errors by up to 26% compared to direct feedback control. 
    more » « less
  4. This paper presents an online, robust, and model-free motion planning framework for kinodynamic systems. In particular, we employ a Q-learning algorithm for a two player zero-sum dynamic game to account for worst-case disturbances and kinodynamic constraints. We use one critic, and two actor approximators to solve online the finite horizon minimax problem with a form of integral reinforcement learning. We then leverage a terminal state evaluation structure to facilitate the online implementation. A static obstacle augmentation, and a local replanning framework is presented to guarantee safe kinodynamic motion planning. Rigorous Lyapunov-based proofs are provided to guarantee closed-loop stability, while maintaining robustness and optimality. We finally evaluate the efficacy of the proposed framework with simulations and we provide a qualitative comparison of kinodynamic motion planning techniques 
    more » « less
  5. Inverse kinematics solves the problem of how to control robot arm joints to achieve desired end effector positions, which is critical to any robot arm design and implemen- tations of control algorithms. It is a common misunderstanding that closed-form inverse kinematics analysis is solved. Popular software and algorithms, such as gradient descent or any multi-variant equations solving algorithm, claims solving inverse kinematics but only on the numerical level. While the numerical inverse kinematics solutions are rela- tively straightforward to obtain, these methods often fail, due to dependency on specific numerical values, even when the inverse kinematics solutions exist. Therefore, closed-form inverse kinematics analysis is superior, but there is no generalized automated algorithm. Up till now, the high-level logical reasoning involved in solving closed-form inverse kine- matics made it hard to automate, so it’s handled by human experts. We developed IKBT, a knowledge-based intelligent system that can mimic human experts’ behaviors in solving closed-from inverse kinematics using Behavior Tree. Knowledge and rules used by engineers when solving closed-from inverse kinematics are encoded as actions in Behavior Tree. The order of applying these rules is governed by higher level composite nodes, which resembles the logical reasoning process of engineers. It is also the first time that the dependency of joint variables, an important issue in inverse kinematics analysis, is automatically tracked in graph form. Besides generating closed-form solutions, IKBT also explains its solving strategies in human (engineers) interpretable form. This is a proof-of-concept of using Behavior Trees to solve high-cognitive problems. 
    more » « less