skip to main content

Title: Distributed Power Allocation for 6-GHz Unlicensed Spectrum Sharing via Multi-agent Deep Reinforcement Learning
We consider the problem of spectrum sharing by multiple cellular operators. We propose a novel deep Reinforcement Learning (DRL)-based distributed power allocation scheme which utilizes the multi-agent Deep Deterministic Policy Gradient (MA-DDPG) algorithm. In particular, we model the base stations (BSs) that belong to the multiple operators sharing the same band, as DRL agents that simultaneously determine the transmit powers to their scheduled user equipment (UE) in a synchronized manner. The power decision of each BS is based on its own observation of the radio environment (RF) environment, which consists of interference measurements reported from the UEs it serves, and a limited amount of information obtained from other BSs. One advantage of the proposed scheme is that it addresses the single-agent non-stationarity problem of RL in the multi-agent scenario by incorporating the actions and observations of other BSs into each BS's own critic which helps it to gain a more accurate perception of the overall RF environment. A centralized-training-distributed-execution framework is used to train the policies where the critics are trained over the joint actions and observations of all BSs while the actor of each BS only takes the local observation as input in order to produce the transmit power. Simulation with the 6 GHz Unlicensed National Information Infrastructure (U-NII)-5 band shows that the proposed power allocation scheme can achieve better throughput performance than several state-of-the-art approaches.  more » « less
Award ID(s):
2153875 2229562
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE Xplore
Date Published:
Journal Name:
2023 IEEE International Conference on Industrial Technology (ICIT)
Page Range / eLocation ID:
1 to 6
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study a multi-agent partially observable environment in which autonomous agents aim to coordinate their actions, while also learning the parameters of the unknown environment through repeated interactions. In particular, we focus on the role of communication in a multi-agent reinforcement learning problem. We consider a learning algorithm in which agents make decisions based on their own observations of the environment, as well as the observations of other agents, which are collected through communication between agents. We first identify two potential benefits of this type of information sharing when agents' observation quality is heterogeneous: (1) it can facilitate coordination among agents, and (2) it can enhance the learning of all participants, including the better informed agents. We show however that these benefits of communication depend in general on its timing, so that delayed information sharing may be preferred in certain scenarios. 
    more » « less
  2. This paper proposes a novel cognitive cooperative transmission scheme by exploiting massive multiple-input multiple-output (MMIMO) and non-orthogonal multiple access (NOMA) radio technologies, which enables a macrocell network and multiple cognitive small cells to cooperate in dynamic spectrum sharing. The macrocell network is assumed to own the spectrum band and be the primary network (PN), and the small cells act as the secondary networks (SNs). The secondary access points (SAPs) of the small cells can cooperatively relay the traffic for the primary users (PUs) in the macrocell network, while concurrently accessing the PUs’ spectrum to transmit their own data opportunistically through MMIMO and NOMA. Such cooperation creates a “win-win” situation: the throughput of PUs will be significantly increased with the help of SAP relays, and the SAPs are able to use the PU’s spectrum to serve their secondary users (SUs). The interplay of these advanced radio techniques is analyzed in a systematic manner, and a framework is proposed for the joint optimization of cooperative relay selection, NOMA and MMIMO transmit power allocation, and transmission scheduling. Further, to model network-wide cooperation and competition, a two-sided matching algorithm is designed to find the stable partnership between multiple SAPs and PUs. The evaluation results demonstrate that the proposed scheme achieves significant performance gains for both primary and secondary users, compared to the baselines. 
    more » « less
  3. In this paper, we consider a general distributed system with multiple agents who select and then implement actions in the system. The system has an operator with a centralized objective. The agents, on the other hand, are selfinterested and strategic in the sense that each agent optimizes its own individual objective. The operator aims to mitigate this misalignment by designing an incentive scheme for the agents. The problem is difficult due to the cost functions of the agents being coupled, the objective of the operator not being social welfare, and the operator having no direct control over actions being implemented by the agents. This problem has been studied in many fields, particularly in mechanism design and cost allocation. However, mechanism design typically assumes that the operator has knowledge of the cost functions of the agents and the actions being implemented by the operator. On the other hand, cost allocation classically assumes that agents do not anticipate the effect of their actions on the incentive that they obtain. We remove these assumptions and present an incentive rule for this setup by bridging the gap between mechanism design and classical cost allocation. We analyze whether the proposed design satisfies various desirable properties such as social optimality, budget balance, participation constraint, and so on. We also analyze which of these properties can be satisfied if the assumptions of cost functions of the agents being private and the agents being anticipatory are relaxed. 
    more » « less
  4. null (Ed.)
    Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes within the same learning time. 
    more » « less
  5. null (Ed.)
    In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret. 
    more » « less