skip to main content


Title: Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach
In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme.  more » « less
Award ID(s):
2144646
NSF-PAR ID:
10402345
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Electronics
Volume:
12
Issue:
1
ISSN:
2079-9292
Page Range / eLocation ID:
89
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    This paper introduces a distributed adaptive formation control for large‐scale multi‐agent systems (LS‐MAS) that addresses the heavy computational complexity and communication traffic challenges while directly extending conventional distributed control from small scale to large scale. Specifically, a novel hierarchical game theoretic algorithm is developed to provide a feasible theory foundation for solving LS‐MAS distributed optimal formation problem by effectively integrating the mean‐field game (MFG), the Stackelberg game, and the cooperative game. In particular, LS‐MAS is divided into multiple groups geographically with each having one group leader and a significant amount of followers. Then, a cooperative game is used among multi‐group leaders to formulate distributed inter‐group formation control for leaders. Meanwhile, an MFG is adopted for a large number of intra‐group followers to achieve the collective intra‐group formation while a Stackelberg game is connecting the followers with their corresponding leader within the same group to achieve the overall LS‐MAS multi‐group formation behavior. Moreover, a hybrid actor–critic‐based reinforcement learning algorithm is constructed to learn the solution of the hierarchical game‐based optimal distributed formation control. Finally, to show the effectiveness of the presented schemes, numerical simulations and Lyapunov analysis is performed.

     
    more » « less
  2. Emerging on-demand service platforms (OSPs) have recently embraced teamwork as a strategy for stimulating workers’ productivity and mediating temporal supply and demand imbalances. This research investigates the team contest scheme design problem considering work schedules. Introducing teams on OSPs creates a hierarchical single-leader multi-follower game. The leader (platform) establishes rewards and intrateam revenue-sharing rules for distributing workers’ payoffs. Each follower (team) competes with others by coordinating the schedules of its team members to maximize the total expected utility. The concurrence of interteam competition and intrateam coordination causes dual effects, which are captured by an equilibrium analysis of the followers’ game. To align the platform’s interest with workers’ heterogeneous working-time preferences, we propose a profit-maximizing contest scheme consisting of a winner’s reward and time-varying payments. A novel algorithm that combines Bayesian optimization, duality, and a penalty method solves the optimal scheme in the nonconvex equilibrium-constrained problem. Our results indicate that teamwork is a useful strategy with limitations. Under the proposed scheme, team contest always benefits workers. Intrateam coordination helps teams strategically mitigate the negative externalities caused by overcompetition among workers. For the platform, the optimal scheme can direct teams’ schedules toward more profitable market equilibria when workers have inaccurate perceptions of the market. History: This paper has been accepted for the Service Science Special Issue on Innovation in Transportation-Enabled Urban Services. Funding: This work was supported by the National Science Foundation [Grant FW-HTF-P 2222806]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/serv.2023.0320 . 
    more » « less
  3. null (Ed.)
    In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios. 
    more » « less
  4. This paper proposes an intelligent multi-agent approach in a real-time strategy game, StarCraft, based on the deep deterministic policy gradients (DDPG) techniques. An actor and a critic network are established to estimate the optimal control actions and corresponding value functions, respectively. A special reward function is designed based on the agents' own condition and enemies' information to help agents make intelligent control in the game. Furthermore, in order to accelerate the learning process, the transfer learning techniques are integrated into the training process. Specifically, the agents are trained initially in a simple task to learn the basic concept for the combat, such as detouring moving, avoiding and joining attacking. Then, we transfer this experience to the target task with a complex and difficult scenario. From the experiment, it is shown that our proposed algorithm with transfer learning can achieve better performance. 
    more » « less
  5. We develop a general reinforcement learning framework for mean field control (MFC) problems. Such problems arise for instance as the limit of collaborative multi-agent control problems when the number of agents is very large. The asymptotic problem can be phrased as the optimal control of a non-linear dynamics. This can also be viewed as a Markov decision process (MDP) but the key difference with the usual RL setup is that the dynamics and the reward now depend on the state's probability distribution itself. Alternatively, it can be recast as a MDP on the Wasserstein space of measures. In this work, we introduce generic model-free algorithms based on the state-action value function at the mean field level and we prove convergence for a prototypical Q-learning method. We then implement an actor-critic method and report numerical results on two archetypal problems: a finite space model motivated by a cyber security application and a continuous space model motivated by an application to swarm motion. 
    more » « less