skip to main content


Title: Fast and Accurate Trajectory Tracking for Unmanned Aerial Vehicles based on Deep Reinforcement Learning
Continuous trajectory control of fixed-wing unmanned aerial vehicles (UAVs) is complicated when considering hidden dynamics. Due to UAV multi degrees of freedom, tracking methodologies based on conventional control theory, such as Proportional-Integral-Derivative (PID) has limitations in response time and adjustment robustness, while a model based approach that calculates the force and torques based on UAV’s current status is complicated and rigid.We present an actor-critic reinforcement learning framework that controls UAV trajectory through a set of desired waypoints. A deep neural network is constructed to learn the optimal tracking policy and reinforcement learning is developed to optimize the resulting tracking scheme. The experimental results show that our proposed approach can achieve 58.14% less position error, 21.77% less system power consumption and 9:23% faster attainment than the baseline. The actor network consists of only linear operations, hence Field Programmable Gate Arrays (FPGA) based hardware acceleration can easily be designed for energy efficient real-time control.  more » « less
Award ID(s):
1739748
NSF-PAR ID:
10195584
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Unmanned aerial vehicle (UAV) technology is a rapidly growing field with tremendous opportunities for research and applications. To achieve true autonomy for UAVs in the absence of remote control, external navigation aids like global navigation satellite systems and radar systems, a minimum energy trajectory planning that considers obstacle avoidance and stability control will be the key. Although this can be formulated as a constrained optimization problem, due to the complicated non-linear relationships between UAV trajectory and thrust control, it is almost impossible to be solved analytically. While deep reinforcement learning is known for its ability to provide model free optimization for complex system through learning, its state space, actions and reward functions must be designed carefully. This paper presents our vision of different layers of autonomy in a UAV system, and our effort in generating and tracking the trajectory both using deep reinforcement learning (DRL). The experimental results show that compared to conventional approaches, the learned trajectory will need 20% less control thrust and 18% less time to reach the target. Furthermore, using the control policy learning by DRL, the UAV will achieve 58.14% less position error and 21.77% less system power. 
    more » « less
  2. Healthy human locomotion functions with good gait symmetry depend on rhythmic coordination of the left and right legs, which can be deteriorated by neurological disorders like stroke and spinal cord injury. Powered exoskeletons are promising devices to improve impaired people's locomotion functions, like gait symmetry. However, given higher uncertainties and the time-varying nature of human-robot interaction, providing personalized robotic assistance from exoskeletons to achieve the best gait symmetry is challenging, especially for people with neurological disorders. In this paper, we propose a hierarchical control framework for a bilateral hip exoskeleton to provide the adaptive optimal hip joint assistance with a control objective of imposing the desired gait symmetry during walking. Three control levels are included in the hierarchical framework, including the high-level control to tune three control parameters based on a policy iteration reinforcement learning approach, the middle-level control to define the desired assistive torque profile based on a delayed output feedback control method, and the low-level control to achieve a good torque trajectory tracking performance. To evaluate the feasibility of the proposed control framework, five healthy young participants are recruited for treadmill walking experiments, where an artificial gait asymmetry is imitated as the hemiparesis post-stroke, and only the ‘paretic’ hip joint is controlled with the proposed framework. The pilot experimental studies demonstrate that the hierarchical control framework for the hip exoskeleton successfully (asymmetry index from 8.8% to − 0.5%) and efficiently (less than 4 minutes) achieved the desired gait symmetry by providing adaptive optimal assistance on the ‘paretic’ hip joint. 
    more » « less
  3. Abstract This paper is concerned with solving, from the learning-based decomposition control viewpoint, the problem of output tracking with nonperiodic tracking–transition switching. Such a nontraditional tracking problem occurs in applications where sessions for tracking a given desired trajectory are alternated with those for transiting the output with given boundary conditions. It is challenging to achieve precision tracking while maintaining smooth tracking–transition switching, as postswitching oscillations can be induced due to the mismatch of the boundary states at the switching instants, and the tracking performance can be limited by the nonminimum-phase (NMP) zeros of the system and effected by factors such as input constraints and external disturbances. Although recently an approach by combining the system-inversion with optimization techniques has been proposed to tackle these challenges, modeling of the system dynamics and complicated online computation are needed, and the controller obtained can be sensitive to model uncertainties. In this work, a learning-based decomposition control technique is developed to overcome these limitations. A dictionary of input–output bases is constructed offline a priori via data-driven iterative learning first. The input–output bases are used online to decompose the desired output in the tracking sessions and design an optimal desired transition trajectory with minimal transition time under input-amplitude constraint. Finally, the control input is synthesized based on the superpositioning principle and further optimized online to account for system variations and external disturbance. The proposed approach is illustrated through a nanopositioning control experiment on a piezoelectric actuator. 
    more » « less
  4. A framework for autonomous waypoint planning, trajectory generation through waypoints, and trajectory tracking for multi-rotor unmanned aerial vehicles (UAVs) is proposed in this work. Safe and effective operations of these UAVs is a problem that demands obstacle avoidance strategies and advanced trajectory planning and control schemes for stability and energy efficiency. To address this problem, a two-level optimization strategy is used for trajectory generation, then the trajectory is tracked in a stable manner. The framework given here consists of the following components: (a) a deep reinforcement learning (DRL)-based algorithm for optimal waypoint planning while minimizing control energy and avoiding obstacles in a given environment; (b) an optimal, smooth trajectory generation algorithm through waypoints, that minimizes a combinaton of velocity, acceleration, jerk and snap; and (c) a stable tracking control law that determines a control thrust force for an UAV to track the generated trajectory. 
    more » « less
  5. null (Ed.)
    To mitigate the long-term spectrum crunch problem, the FCC recently opened up the 6 GHz frequency band for unlicensed use. However, the existing spectrum sharing strategies cannot support the operation of access points in moving vehicles such as cars and UAVs. This is primarily because of the directionality-based spectrum sharing among the incumbent systems in this band and the high mobility of the moving vehicles, which together make it challenging to control the cross-system interference. In this paper we propose SwarmShare, a mobility-resilient spectrum sharing framework for swarm UAV networking in the 6 GHz band. We first present a mathematical formulation of the SwarmShare problem, where the objective is to maximize the spectral efficiency of the UAV network by jointly controlling the flight and transmission power of the UAVs and their association with the ground users, under the interference constraints of the incumbent system. We find that there are no closed-form mathematical models that can be used characterize the statistical behaviors of the aggregate interference from the UAVs to the incumbent system. Then we propose a data-driven three-phase spectrum sharing approach, including Initial Power Enforcement, Offline-dataset Guided Online Power Adaptation, and Reinforcement Learning-based UAV Optimization. We validate the effectiveness of SwarmShare through an extensive simulation campaign. Results indicate that, based on SwarmShare, the aggregate interference from the UAVs to the incumbent system can be effectively controlled below the target level without requiring the real-time cross-system channel state information. The mobility resilience of SwarmShare is also validated in coexisting networks with no precise UAV location information. 
    more » « less