skip to main content


Title: Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach
In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme.  more » « less
Award ID(s):
2144646
PAR ID:
10402345
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Electronics
Volume:
12
Issue:
1
ISSN:
2079-9292
Page Range / eLocation ID:
89
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The multiple-user terminals in a satellite transponder’s communication channel compete for limited radio resources to meet their own data rate needs. Because inter-user interference limits on the satellite transponder’s performance, the transponder’s power-control system needs to coordinate all its users to reduce interference and maximizes overall performance of this channel. This paper studies Stackelberg competition among the asymmetrical users in a transponder’s channel, where some users called leader have priority to choose their power control strategy, but other users called followers have to optimize their power control strategy with given leader’s controls. A Stackelberg Differential Game (SDG) is set up to model the Stackelberg competition in a transponder’s communication channel. Each user’s utility function is a trade-off between transmission data rate and power consumption. The dynamics of the system is the changing of channel gain. The optimality condition of Stackelberg equilibrium of leaders and followers is a set of Differential Algebraic Equations (DAE) with an imbedded control strategies from its counterpart. In order to solve for Stackelberg equilibrium, an algorithm based on optimizing leaders’ and followers’ Hamiltonians iteratively is developed. The numerical solution of the SDG model provides the transponder’s power control system with each user’s power-control strategy at the Stackelberg equilibrium. 
    more » « less
  2. In this work, we propose a two-stage multi-agent deep deterministic policy gradient (TS-MADDPG) algorithm for communication-free, multi-agent reinforcement learning (MARL) under partial states and observations. In the first stage, we train prototype actor-critic networks using only partial states at actors. In the second stage, we incorporate partial observations resulting from prototype actions as side information at actors to enhance actor-critic training. This side information is useful to infer the unobserved states and hence, can help reduce the performance gap between a network with fully observable states and a partially observable one. Using a case study of building energy control in the power distribution network, we successfully demonstrate that the proposed TS-MADDPG can greatly improve the performance of single-stage MADDPG algorithms that use partial states only. This is the first work that utilizes partial local voltage measurements as observations to improve the MARL performance for a distributed power network. 
    more » « less
  3. Summary

    This paper studies containment control with communication delays and switching topologies. Firstly, a containment control algorithm for first‐order discrete‐time followers is proposed. Then, it is extended to handle double‐integrator dynamics. The main approach is to use the convexity of the convex hull spanned by multiple stationary leaders to verify the nonincreasing monotonicity of the largest distance from the followers to the convex hull. It is shown that both algorithms are robust to arbitrarily bounded communication delays as long as each follower jointly has a path from some leaders to itself. Finally, a numerical example is implemented to illustrate the effectiveness of the theoretical results.

     
    more » « less
  4. We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks. 
    more » « less
  5. Emerging on-demand service platforms (OSPs) have recently embraced teamwork as a strategy for stimulating workers’ productivity and mediating temporal supply and demand imbalances. This research investigates the team contest scheme design problem considering work schedules. Introducing teams on OSPs creates a hierarchical single-leader multi-follower game. The leader (platform) establishes rewards and intrateam revenue-sharing rules for distributing workers’ payoffs. Each follower (team) competes with others by coordinating the schedules of its team members to maximize the total expected utility. The concurrence of interteam competition and intrateam coordination causes dual effects, which are captured by an equilibrium analysis of the followers’ game. To align the platform’s interest with workers’ heterogeneous working-time preferences, we propose a profit-maximizing contest scheme consisting of a winner’s reward and time-varying payments. A novel algorithm that combines Bayesian optimization, duality, and a penalty method solves the optimal scheme in the nonconvex equilibrium-constrained problem. Our results indicate that teamwork is a useful strategy with limitations. Under the proposed scheme, team contest always benefits workers. Intrateam coordination helps teams strategically mitigate the negative externalities caused by overcompetition among workers. For the platform, the optimal scheme can direct teams’ schedules toward more profitable market equilibria when workers have inaccurate perceptions of the market. History: This paper has been accepted for the Service Science Special Issue on Innovation in Transportation-Enabled Urban Services. Funding: This work was supported by the National Science Foundation [Grant FW-HTF-P 2222806]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/serv.2023.0320 . 
    more » « less