skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Intelligent Distributed Swarm Control for Large-Scale Multi-UAV Systems: A Hierarchical Learning Approach
In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme.  more » « less
Award ID(s):
2144646
PAR ID:
10402345
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Electronics
Volume:
12
Issue:
1
ISSN:
2079-9292
Page Range / eLocation ID:
89
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The multiple-user terminals in a satellite transponder’s communication channel compete for limited radio resources to meet their own data rate needs. Because inter-user interference limits on the satellite transponder’s performance, the transponder’s power-control system needs to coordinate all its users to reduce interference and maximizes overall performance of this channel. This paper studies Stackelberg competition among the asymmetrical users in a transponder’s channel, where some users called leader have priority to choose their power control strategy, but other users called followers have to optimize their power control strategy with given leader’s controls. A Stackelberg Differential Game (SDG) is set up to model the Stackelberg competition in a transponder’s communication channel. Each user’s utility function is a trade-off between transmission data rate and power consumption. The dynamics of the system is the changing of channel gain. The optimality condition of Stackelberg equilibrium of leaders and followers is a set of Differential Algebraic Equations (DAE) with an imbedded control strategies from its counterpart. In order to solve for Stackelberg equilibrium, an algorithm based on optimizing leaders’ and followers’ Hamiltonians iteratively is developed. The numerical solution of the SDG model provides the transponder’s power control system with each user’s power-control strategy at the Stackelberg equilibrium. 
    more » « less
  2. In this work, we propose a two-stage multi-agent deep deterministic policy gradient (TS-MADDPG) algorithm for communication-free, multi-agent reinforcement learning (MARL) under partial states and observations. In the first stage, we train prototype actor-critic networks using only partial states at actors. In the second stage, we incorporate partial observations resulting from prototype actions as side information at actors to enhance actor-critic training. This side information is useful to infer the unobserved states and hence, can help reduce the performance gap between a network with fully observable states and a partially observable one. Using a case study of building energy control in the power distribution network, we successfully demonstrate that the proposed TS-MADDPG can greatly improve the performance of single-stage MADDPG algorithms that use partial states only. This is the first work that utilizes partial local voltage measurements as observations to improve the MARL performance for a distributed power network. 
    more » « less
  3. Emerging on-demand service platforms (OSPs) have recently embraced teamwork as a strategy for stimulating workers’ productivity and mediating temporal supply and demand imbalances. This research investigates the team contest scheme design problem considering work schedules. Introducing teams on OSPs creates a hierarchical single-leader multi-follower game. The leader (platform) establishes rewards and intrateam revenue-sharing rules for distributing workers’ payoffs. Each follower (team) competes with others by coordinating the schedules of its team members to maximize the total expected utility. The concurrence of interteam competition and intrateam coordination causes dual effects, which are captured by an equilibrium analysis of the followers’ game. To align the platform’s interest with workers’ heterogeneous working-time preferences, we propose a profit-maximizing contest scheme consisting of a winner’s reward and time-varying payments. A novel algorithm that combines Bayesian optimization, duality, and a penalty method solves the optimal scheme in the nonconvex equilibrium-constrained problem. Our results indicate that teamwork is a useful strategy with limitations. Under the proposed scheme, team contest always benefits workers. Intrateam coordination helps teams strategically mitigate the negative externalities caused by overcompetition among workers. For the platform, the optimal scheme can direct teams’ schedules toward more profitable market equilibria when workers have inaccurate perceptions of the market. History: This paper has been accepted for the Service Science Special Issue on Innovation in Transportation-Enabled Urban Services. Funding: This work was supported by the National Science Foundation [Grant FW-HTF-P 2222806]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/serv.2023.0320 . 
    more » « less
  4. null (Ed.)
    In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios. 
    more » « less
  5. This paper proposes an intelligent multi-agent approach in a real-time strategy game, StarCraft, based on the deep deterministic policy gradients (DDPG) techniques. An actor and a critic network are established to estimate the optimal control actions and corresponding value functions, respectively. A special reward function is designed based on the agents' own condition and enemies' information to help agents make intelligent control in the game. Furthermore, in order to accelerate the learning process, the transfer learning techniques are integrated into the training process. Specifically, the agents are trained initially in a simple task to learn the basic concept for the combat, such as detouring moving, avoiding and joining attacking. Then, we transfer this experience to the target task with a complex and difficult scenario. From the experiment, it is shown that our proposed algorithm with transfer learning can achieve better performance. 
    more » « less