Unmanned aerial vehicle (UAV) technology is a rapidly growing field with tremendous opportunities for research and applications. To achieve true autonomy for UAVs in the absence of remote control, external navigation aids like global navigation satellite systems and radar systems, a minimum energy trajectory planning that considers obstacle avoidance and stability control will be the key. Although this can be formulated as a constrained optimization problem, due to the complicated non-linear relationships between UAV trajectory and thrust control, it is almost impossible to be solved analytically. While deep reinforcement learning is known for its ability to provide model free optimization for complex system through learning, its state space, actions and reward functions must be designed carefully. This paper presents our vision of different layers of autonomy in a UAV system, and our effort in generating and tracking the trajectory both using deep reinforcement learning (DRL). The experimental results show that compared to conventional approaches, the learned trajectory will need 20% less control thrust and 18% less time to reach the target. Furthermore, using the control policy learning by DRL, the UAV will achieve 58.14% less position error and 21.77% less system power.
more »
« less
Fast and Accurate Trajectory Tracking for Unmanned Aerial Vehicles based on Deep Reinforcement Learning
Continuous trajectory control of fixed-wing unmanned aerial vehicles (UAVs) is complicated when considering hidden dynamics. Due to UAV multi degrees of freedom, tracking methodologies based on conventional control theory, such as Proportional-Integral-Derivative (PID) has limitations in response time and adjustment robustness, while a model based approach that calculates the force and torques based on UAV’s current status is complicated and rigid.We present an actor-critic reinforcement learning framework that controls UAV trajectory through a set of desired waypoints. A deep neural network is constructed to learn the optimal tracking policy and reinforcement learning is developed to optimize the resulting tracking scheme. The experimental results show that our proposed approach can achieve 58.14% less position error, 21.77% less system power consumption and 9:23% faster attainment than the baseline. The actor network consists of only linear operations, hence Field Programmable Gate Arrays (FPGA) based hardware acceleration can easily be designed for energy efficient real-time control.
more »
« less
- Award ID(s):
- 1739748
- NSF-PAR ID:
- 10195584
- Date Published:
- Journal Name:
- IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
- Page Range / eLocation ID:
- 1 to 9
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A framework for autonomous waypoint planning, trajectory generation through waypoints, and trajectory tracking for multi-rotor unmanned aerial vehicles (UAVs) is proposed in this work. Safe and effective operations of these UAVs is a problem that demands obstacle avoidance strategies and advanced trajectory planning and control schemes for stability and energy efficiency. To address this problem, a two-level optimization strategy is used for trajectory generation, then the trajectory is tracked in a stable manner. The framework given here consists of the following components: (a) a deep reinforcement learning (DRL)-based algorithm for optimal waypoint planning while minimizing control energy and avoiding obstacles in a given environment; (b) an optimal, smooth trajectory generation algorithm through waypoints, that minimizes a combinaton of velocity, acceleration, jerk and snap; and (c) a stable tracking control law that determines a control thrust force for an UAV to track the generated trajectory.more » « less
-
This paper considers the optimal control of a second-order nonlinear system with unknown dynamics. A new reinforcement learning based approach is proposed with the aid of direct adaptive control. By the new approach actor-critic reinforcement learning algorithms are proposed with three neural network approximation. Simulation results are presented to show the effectiveness of the proposed algorithms.more » « less
-
Abstract This paper is concerned with solving, from the learning-based decomposition control viewpoint, the problem of output tracking with nonperiodic tracking–transition switching. Such a nontraditional tracking problem occurs in applications where sessions for tracking a given desired trajectory are alternated with those for transiting the output with given boundary conditions. It is challenging to achieve precision tracking while maintaining smooth tracking–transition switching, as postswitching oscillations can be induced due to the mismatch of the boundary states at the switching instants, and the tracking performance can be limited by the nonminimum-phase (NMP) zeros of the system and effected by factors such as input constraints and external disturbances. Although recently an approach by combining the system-inversion with optimization techniques has been proposed to tackle these challenges, modeling of the system dynamics and complicated online computation are needed, and the controller obtained can be sensitive to model uncertainties. In this work, a learning-based decomposition control technique is developed to overcome these limitations. A dictionary of input–output bases is constructed offline a priori via data-driven iterative learning first. The input–output bases are used online to decompose the desired output in the tracking sessions and design an optimal desired transition trajectory with minimal transition time under input-amplitude constraint. Finally, the control input is synthesized based on the superpositioning principle and further optimized online to account for system variations and external disturbance. The proposed approach is illustrated through a nanopositioning control experiment on a piezoelectric actuator.more » « less
-
This paper considers the trajectory design problem for unmanned aerial vehicles (UAVs) via meta-reinforcement learning. It is assumed that the UAV can move in different directions to explore a specific area and collect data from the ground nodes (GNs) located in the area. The goal of the UAV is to reach the destination and maximize the total data collected during the flight on the trajectory while avoiding collisions with other UAVs. In the literature on UAV trajectory designs, vanilla learning algorithms are typically used to train a task-specific model, and provide near-optimal solutions for a specific spatial distribution of the GNs. However, this approach requires retraining from scratch when the locations of the GNs vary. In this work, we propose a meta reinforcement learning framework that incorporates the method of Model-Agnostic Meta-Learning (MAML). Instead of training task-specific models, we train a common initialization for different distributions of GNs and different channel conditions. From the initialization, only a few gradient descents are required for adapting to different tasks with different GN distributions and channel conditions. Additionally, we also explore when the proposed MAML framework is preferred and can outperform the compared algorithms.more » « less