skip to main content


Title: Invited Paper: Actuator Trajectory Planning for UAVs with Overhead Manipulator Using Reinforcement Learning
In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called end-effector. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator’s reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator’s end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes.  more » « less
Award ID(s):
2204721
PAR ID:
10542188
Author(s) / Creator(s):
; ;
Corporate Creator(s):
Editor(s):
NA
Publisher / Repository:
2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC)
Date Published:
Edition / Version:
1
Volume:
1
Issue:
1
ISSN:
2166-9589
ISBN:
978-1-6654-6483-3
Page Range / eLocation ID:
1-6
Subject(s) / Keyword(s):
Aerial Manipulators Q-learning Unmanned Aerial Vehicles Trajectory Optimization
Format(s):
Medium: X Size: 2MB Other: pdf
Size(s):
2MB
Location:
Toronto, ON, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. For a wearable robotic arm to autonomously assist a human, it has to be able to stabilize its end-effector in light of the human’s independent activities. This paper presents a method for stabilizing the end-effector in planar assembly and pick-and-place tasks. Ideally, given an accurate positioning of the end effector and the wearable robot attachment point, human disturbances could be compensated by using a simple feedback control strategy. Realistically, system delays in both sensing and actuation suggest a predictive approach. In this work, we characterize the actuators of a wearable robotic arm and estimate these delays using linear models. We then consider the motion of the human arm as an autoregressive process to predict the deviation in the robot’s base position at a time horizon equivalent to the estimated delay. Generating set points for the end-effector using this predictive model, we report reduced position errors of 19.4% (x) and 20.1% (y) compared to a feedback control strategy without prediction. 
    more » « less
  2. In nature, animals with soft body parts demonstrate remarkable control over their shape, such as an elephant trunk wrapping around a tree branch to pick it up. However, most research on robotic manipulators focuses on controlling the end effector, partly because the manipulator’s arm is rigidly articulated. With recent advances in soft robotics research, controlling a soft manipulator into many different shapes will significantly improve the robot’s functionality, such as medical robots morphing their shape to navigate the digestive system and deliver drugs to specific locations. However, controlling the shape of soft robots is challenging due to their highly nonlinear dynamics that are computationally intensive. In this paper, we leverage a physics-informed, data-driven approach using the Koopman operator to realize the shape control of soft robots. We simulate the dynamics of a soft manipulator using a physics-based simulator (PyElastica) to generate the input-output data, which is then used to identify an approximated linear model based on the Koopman operator. We then formulate the shapecontrol problem as a convex optimization problem that is computationally efficient. Our linear model is over 12 times faster than the physics-based model in simulating the manipulator’s motion. Further, we can control a soft manipulator into different shapes using model predictive control. We envision that the proposed method can be effectively used to control the shapes of soft robots to interact with uncertain environments or enable shape-morphing robots to fulfill diverse tasks. 
    more » « less
  3. We present a closed-loop multi-arm motion planner that is scalable and flexible with team size. Traditional multi-arm robotic systems have relied on centralized motion planners, whose run times often scale exponentially with team size, and thus, fail to handle dynamic environments with open-loop control. In this paper, we tackle this problem with multi-agent reinforcement learning, where a shared policy network is trained to control each individual robot arm to reach its target end-effector pose given observations of its workspace state and target end-effector pose. The policy is trained using Soft Actor-Critic with expert demonstrations from a sampling-based motion planning algorithm (i.e., BiRRT). By leveraging classical planning algorithms, we can improve the learning efficiency of the reinforcement learning algorithm while retaining the fast inference time of neural networks. The resulting policy scales sub-linearly and can be deployed on multi-arm systems with variable team sizes. Thanks to the closed-loop and decentralized formulation, our approach generalizes to 5-10 multiarm systems and dynamic moving targets (>90% success rate for a 10-arm system), despite being trained on only 1-4 arm planning tasks with static targets. 
    more » « less
  4. To broaden and promote the applications of unmanned aerial vehicles (UAVs), UAVs with agile and omnidirectional mobility enabled by full or over actuation are a growing field of research. However, the balance of motion agility and force (energy) efficiency is challenging for a fixed UAV structure. This paper presents the new design of a transformable UAV, which can operate as a coplanar hexacopter or as an omnidirectional multirotor based on different operation modes. The UAV has 100% force efficiency for launching or landing tasks in the coplanar mode. In the omnidirectional mode, the UAV is fully actuated in the air for agile mobility in six degrees of freedom (DOFs). Models and control design are developed to characterize the motion of the transformable UAV. Simulation results are presented to validate the transformable UAV design and the enhanced UAV performance, compared with a fixed structure. 
    more » « less
  5. Unmanned aerial vehicle (UAV) technology is a rapidly growing field with tremendous opportunities for research and applications. To achieve true autonomy for UAVs in the absence of remote control, external navigation aids like global navigation satellite systems and radar systems, a minimum energy trajectory planning that considers obstacle avoidance and stability control will be the key. Although this can be formulated as a constrained optimization problem, due to the complicated non-linear relationships between UAV trajectory and thrust control, it is almost impossible to be solved analytically. While deep reinforcement learning is known for its ability to provide model free optimization for complex system through learning, its state space, actions and reward functions must be designed carefully. This paper presents our vision of different layers of autonomy in a UAV system, and our effort in generating and tracking the trajectory both using deep reinforcement learning (DRL). The experimental results show that compared to conventional approaches, the learned trajectory will need 20% less control thrust and 18% less time to reach the target. Furthermore, using the control policy learning by DRL, the UAV will achieve 58.14% less position error and 21.77% less system power. 
    more » « less