skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: A Machine Learning Approach for Power Allocation in HetNets Considering QoS
There is an increase in usage of smaller cells or femtocells to improve performance and coverage of next-generation heterogeneous wireless networks (HetNets). However, the interference caused by femtocells to neighboring cells is a limiting performance factor in dense HetNets. This interference is being managed via distributed resource allocation methods. However, as the density of the network increases so does the complexity of such resource allocation methods. Yet, unplanned deployment of femtocells requires an adaptable and self-organizing algorithm to make HetNets viable. As such, we propose to use a machine learning approach based on Q-learning to solve the resource allocation problem in such complex networks. By defining each base station as an agent, a cellular network is modeled as a multi-agent network. Subsequently, cooperative Q-learning can be applied as an efficient approach to manage the resources of a multi-agent network. Furthermore, the proposed approach considers the quality of service (QoS) for each user and fairness in the network. In comparison with prior work, the proposed approach can bring more than a four-fold increase in the number of supported femtocells while using cooperative Q-learning to reduce resource allocation overhead.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2018 IEEE International Conference on Communications (ICC)
Page Range / eLocation ID:
1 to 7
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Graph neural networks (GNNs) have achieved tremendous success in many graph learning tasks such as node classifica- tion, graph classification and link prediction. For the classifi- cation task, GNNs’ performance often highly depends on the number of labeled nodes and thus could be significantly ham- pered due to the expensive annotation cost. The sparse litera- ture on active learning for GNNs has primarily focused on se- lecting only one sample each iteration, which becomes ineffi- cient for large scale datasets. In this paper, we study the batch active learning setting for GNNs where the learning agent can acquire labels of multiple samples at each time. We formu- late batch active learning as a cooperative multi-agent rein- forcement learning problem and present a novel reinforced batch-mode active learning framework (BIGENE). To avoid the combinatorial explosion of the joint action space, we in- troduce a value decomposition method that factorizes the to- tal Q-value into the average of individual Q-values. More- over, we propose a novel multi-agent Q-network consisting of a graph convolutional network (GCN) component and a gated recurrent unit (GRU) component. The GCN compo- nent takes both the informativeness and inter-dependences between nodes into account and the GRU component enables the agent to consider interactions between selected nodes in the same batch. Experimental results on multiple public datasets demonstrate the effectiveness and efficiency of our proposed method. 
    more » « less
  2. This research proposes a dynamic resource allocation method for vehicle-to-everything (V2X) communications in the six generation (6G) cellular networks. Cellular V2X (C-V2X) communications empower advanced applications but at the same time bring unprecedented challenges in how to fully utilize the limited physical-layer resources, given the fact that most of the applications require both ultra low latency, high data rate and high reliability. Resource allocation plays a pivotal role to satisfy such requirements as well as guarantee quality of service (QoS). Based on this observation, a novel fuzzy-logic-assisted Q learning model (FAQ) is proposed to intelligently and dynamically allocate resources by taking advantage of the centralized allocation mode. The proposed FAQ model reuses the resources to maximize the network throughput while minimizing the interference caused by concurrent transmissions. The fuzzy-logic module expedites the learning and improves the performance of the Q-learning. A mathematical model is developed to analyze the network throughput considering the interference. To evaluate the performance, a system model for V2X communications is built for urban areas, where various V2X services are deployed in the network. Simulation results show that the proposed FAQ algorithm can significantly outperform deep reinforcement learning, Q learning and other advanced allocation strategies regarding the convergence speed and the network throughput. 
    more » « less
  3. We propose and evaluate a learning-based framework to address multi-agent resource allocation in coupled wireless systems. In particular we consider, multiple agents (e.g., base stations, access points, etc.) that choose amongst a set of resource allocation options towards achieving their own performance objective /requirements, and where the performance observed at each agent is further coupled with the actions chosen by the other agents, e.g., through interference, channel leakage, etc. The challenge is to find the best collective action. To that end we propose a Multi-Armed Bandit (MAB) framework wherein the best actions (aka arms) are adaptively learned through online reward feedback. Our focus is on systems which are "weakly-coupled" wherein the best arm of each agent is invariant to others' arm selection the majority of the time - this majority structure enables one to develop light weight efficient algorithms. This structure is commonly found in many wireless settings such as channel selection and power control. We develop a bandit algorithm based on the Track-and-Stop strategy, which shows a logarithmic regret with respect to a genie. Finally through simulation, we exhibit the potential use of our model and algorithm in several wireless application scenarios. 
    more » « less
  4. In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches. 
    more » « less
  5. We consider the problem of spectrum sharing by multiple cellular operators. We propose a novel deep Reinforcement Learning (DRL)-based distributed power allocation scheme which utilizes the multi-agent Deep Deterministic Policy Gradient (MA-DDPG) algorithm. In particular, we model the base stations (BSs) that belong to the multiple operators sharing the same band, as DRL agents that simultaneously determine the transmit powers to their scheduled user equipment (UE) in a synchronized manner. The power decision of each BS is based on its own observation of the radio environment (RF) environment, which consists of interference measurements reported from the UEs it serves, and a limited amount of information obtained from other BSs. One advantage of the proposed scheme is that it addresses the single-agent non-stationarity problem of RL in the multi-agent scenario by incorporating the actions and observations of other BSs into each BS's own critic which helps it to gain a more accurate perception of the overall RF environment. A centralized-training-distributed-execution framework is used to train the policies where the critics are trained over the joint actions and observations of all BSs while the actor of each BS only takes the local observation as input in order to produce the transmit power. Simulation with the 6 GHz Unlicensed National Information Infrastructure (U-NII)-5 band shows that the proposed power allocation scheme can achieve better throughput performance than several state-of-the-art approaches. 
    more » « less