We study a multi-agent partially observable environment in which autonomous agents aim to coordinate their actions, while also learning the parameters of the unknown environment through repeated interactions. In particular, we focus on the role of communication in a multi-agent reinforcement learning problem. We consider a learning algorithm in which agents make decisions based on their own observations of the environment, as well as the observations of other agents, which are collected through communication between agents. We first identify two potential benefits of this type of information sharing when agents' observation quality is heterogeneous: (1) it can facilitate coordination among agents, and (2) it can enhance the learning of all participants, including the better informed agents. We show however that these benefits of communication depend in general on its timing, so that delayed information sharing may be preferred in certain scenarios. 
                        more » 
                        « less   
                    
                            
                            Distributed Power Allocation for 6-GHz Unlicensed Spectrum Sharing via Multi-agent Deep Reinforcement Learning
                        
                    
    
            We consider the problem of spectrum sharing by multiple cellular operators. We propose a novel deep Reinforcement Learning (DRL)-based distributed power allocation scheme which utilizes the multi-agent Deep Deterministic Policy Gradient (MA-DDPG) algorithm. In particular, we model the base stations (BSs) that belong to the multiple operators sharing the same band, as DRL agents that simultaneously determine the transmit powers to their scheduled user equipment (UE) in a synchronized manner. The power decision of each BS is based on its own observation of the radio environment (RF) environment, which consists of interference measurements reported from the UEs it serves, and a limited amount of information obtained from other BSs. One advantage of the proposed scheme is that it addresses the single-agent non-stationarity problem of RL in the multi-agent scenario by incorporating the actions and observations of other BSs into each BS's own critic which helps it to gain a more accurate perception of the overall RF environment. A centralized-training-distributed-execution framework is used to train the policies where the critics are trained over the joint actions and observations of all BSs while the actor of each BS only takes the local observation as input in order to produce the transmit power. Simulation with the 6 GHz Unlicensed National Information Infrastructure (U-NII)-5 band shows that the proposed power allocation scheme can achieve better throughput performance than several state-of-the-art approaches. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10437429
- Publisher / Repository:
- IEEE Xplore
- Date Published:
- Journal Name:
- 2023 IEEE International Conference on Industrial Technology (ICIT)
- Page Range / eLocation ID:
- 1 to 6
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's longterm rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm-an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination-and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guar-antee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data-past environment interactions-when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination.more » « less
- 
            We study multi-base station (BS) preamble detection schemes for the narrow-band Internet of Things (NB-IoT) random access by using stochastic geometry analysis. Specifically, we compare the preamble detection performance of two baseline detection schemes: Quantize-and-Forward (QnF) and Detect-and-Forward (DnF). QnF requires the feedback of quantized received power levels while DnF requires 1-bit feedback of local detection result. Our results show that DnF scheme outperforms QnF scheme when the backhaul capacity is limited or when the minimum distance between user and BSs is less than a threshold. Our results also show that the use of multiple collaborative BSs can lead to a significant improvement of the preamble detection performance, as well as reduction of the total power of the preamble transmission.more » « less
- 
            Abstract Deep Reinforcement Learning (DRL) has shown promise for voltage control in power systems due to its speed and model‐free nature. However, learning optimal control policies through trial and error on a real grid is infeasible due to the mission‐critical nature of power systems. Instead, DRL agents are typically trained on a simulator, which may not accurately represent the real grid. This discrepancy can lead to suboptimal control policies and raises concerns for power system operators. In this paper, we revisit the problem of RL‐based voltage control and investigate how model inaccuracies affect the performance of the DRL agent. Extensive numerical experiments are conducted to quantify the impact of model inaccuracies on learning outcomes. Specifically, techniques that enable the DRL agent are focused on learning robust policies that can still perform well in the presence of model errors. Furthermore, the impact of the agent's decisions on the overall system loss are analyzed to provide additional insight into the control problem. This work aims to address the concerns of power system operators and make DRL‐based voltage control more practical and reliable.more » « less
- 
            Spectrum sharing between terrestrial 5G and incumbent networks in the satellite bands presents a promising avenue to satisfy the ever-increasing bandwidth demand of the next-generation wireless networks. However, protecting incumbent operations from harmful interference poses a fundamental challenge in accommodating terrestrial broadband cellular networks in the satellite bands. State-of-the-art spectrum-sharing policies usually consider several worst-case assumptions and ignore site-specific contextual factors in making spectrum-sharing decisions, and thus, often results in under-utilization of the shared band for the secondary licensees. To address such limitations, this paper introduces CAT3S (Context-Aware Terrestrial-Satellite Spectrum Sharing) framework that empowers the coexisting terrestrial 5G network to maximize utilization of the shared satellite band without creating harmful interference to the incumbent links by exploiting the contextual factors. CAT3S consists of the following two components: (i) context-acquisition unit to collect and process essential contextual information for spectrum sharing and (ii) context-aware base station (BS) control unit to optimize the set of operational BSs and their operation parameters (i.e., transmit power and active beams per sector). To evaluate the performance of the CAT3S, a realistic spectrum coexistence case study over the 12 GHz band is considered. Experiment results demonstrate that the proposed CAT3S achieves notably higher spectrum utilization than state-of-the-art spectrum-sharing policies in different weather contexts.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
