This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising technique to enhance the spectrum efficiency. Simultaneously, RIS have gained considerable attention due to their ability to enhance the quality of dynamic wireless networks by maximizing the spectrum efficiency without increasing the power consumption. However, prevalent centralized D2D transmission schemes require global information, leading to a significant signaling overhead. Conversely, existing distributed schemes, while avoiding the need for global information, often demand frequent information exchange among D2D users, falling short of achieving global optimization. This paper introduces a framework comprising an outer loop and inner loop. In the outer loop, decentralized dynamic resource allocation optimization has been developed for self-organizing network communication aided by RIS. This is accomplished through the application of a multi-player multi-armed bandit approach, completing strategies for RIS and resource block selection. Notably, these strategies operate without requiring signal interaction during execution. Meanwhile, in the inner loop, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm has been adopted for cooperative learning with neural networks (NNs) to obtain optimal transmit power control and RIS phase shift control for multiple users, with a specified RIS and resource block selection policy from the outer loop. Through the utilization of optimization theory, distributed optimal resource allocation can be attained as the outer and inner reinforcement learning algorithms converge over time. Finally, a series of numerical simulations are presented to validate and illustrate the effectiveness of the proposed scheme.
more »
« less
Deep Q-Network Based Power Allocation Meets Reservoir Computing in Distributed Dynamic Spectrum Access Networks
Dynamic spectrum access (DSA) is regarded as one of the key enabling technologies for future communication networks. In this paper, we introduce a power allocation strategy for distributed DSA networks using a powerful machine learning tool, namely deep reinforcement learning. The introduced power allocation strategy enables DSA users to conduct power allocation in a distributed fashion without relying on channel state information and cooperations among DSA users. Furthermore, to capture the temporal correlation of the underlying DSA network environments, the reservoir computing, a special class of recurrent neural network, is employed to realize the introduced deep reinforcement learning scheme. The combination of reservoir computing and deep reinforcement learning significantly improves the efficiency of the introduced resource allocation scheme. Simulation evaluations are conducted to demonstrate the effectiveness of the introduced power allocation strategy.
more »
« less
- PAR ID:
- 10161678
- Date Published:
- Journal Name:
- IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)
- Page Range / eLocation ID:
- 774 to 779
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)The Reservoir Computing, a neural computing framework suited for temporal information processing, utilizes a dynamic reservoir layer for high-dimensional encoding, enhancing the separability of the network. In this paper, we exploit a Deep Learning (DL)-based detection strategy for Multiple-input, Multiple-output Orthogonal Frequency-Division Multiplexing (MIMO-OFDM) symbol detection. To be specific, we introduce a Deep Echo State Network (DESN), a unique hierarchical processing structure with multiple time intervals, to enhance the memory capacity and accelerate the detection efficiency. The resulting hardware prototype with the hybrid memristor-CMOS co-design provides in-memory computing and parallel processing capabilities, significantly reducing the hardware and power overhead. With the standard 180nm CMOS process and memristive synapses, the introduced DESN consumes merely 105mW of power consumption, exhibiting 16.7% power reduction compared to shallow ESN designs even with more dynamic layers and associated neurons. Furthermore, numerical evaluations demonstrate the advantages of the DESN over state-of-the-art detection techniques in the literate for MIMO-OFDM systems even with a very limited training set, yielding a 47.8% improvement against conventional symbol detection techniques.more » « less
-
null (Ed.)Federated learning (FL) is a highly pursued machine learning technique that can train a model centrally while keeping data distributed. Distributed computation makes FL attractive for bandwidth limited applications especially in wireless communications. There can be a large number of distributed edge devices connected to a central parameter server (PS) and iteratively download/upload data from/to the PS. Due to limited bandwidth, only a subset of connected devices can be scheduled in each round. There are usually millions of parameters in the state-of-art machine learning models such as deep learning, resulting in a high computation complexity as well as a high communication burden on collecting/distributing data for training. To improve communication efficiency and make the training model converge faster, we propose a new scheduling policy and power allocation scheme using non-orthogonal multiple access (NOMA) settings to maximize the weighted sum data rate under practical constraints during the entire learning process. NOMA allows multiple users to transmit on the same channel simultaneously. The user scheduling problem is transformed into a maximum-weight independent set problem that can be solved using graph theory. Simulation results show that the proposed scheduling and power allocation scheme can help achieve a higher FL testing accuracy in NOMA based wireless networks than other existing schemes within the same learning time.more » « less
-
Mobile edge computing (MEC) is an emerging paradigm that integrates computing resources in wireless access networks to process computational tasks in close proximity to mobile users with low latency. In this paper, we propose an online double deep Q networks (DDQN) based learning scheme for task assignment in dynamic MEC networks, which enables multiple distributed edge nodes and a cloud data center to jointly process user tasks to achieve optimal long-term quality of service (QoS). The proposed scheme captures a wide range of dynamic network parameters including non-stationary node computing capabilities, network delay statistics, and task arrivals. It learns the optimal task assignment policy with no assumption on the knowledge of the underlying dynamics. In addition, the proposed algorithm accounts for both performance and complexity, and addresses the state and action space explosion problem in conventional Q learning. The evaluation results show that the proposed DDQN-based task assignment scheme significantly improves the QoS performance, compared to the existing schemes that do not consider the effects of network dynamics on the expected long-term rewards, while scaling reasonably well as the network size increases.more » « less
-
This paper proposes a user scheduling and power allocation method for content delivery in wireless caching helper networks without any stringent constraint on the interference model. For supporting delay-sensitive and time-varying user demands, the actual delivery quantity of the requested content should be dynamically controlled by advanced scheduling and power allocation. In addition, it is difficult for a central unit to control the content delivery due to a lack of knowledge of the entire time-varying network; therefore, a belief-propagation (BP)-based algorithm that facilitates distributed decisions on user scheduling and power allocation at every caching helper is presented. The proposed delivery scheme maximizes power efficiency while limiting the average delay of user request satisfactions by managing interference among users well. Simulation results show that the proposed scheme provides almost the same delay performance as the exhaustively found optimal one at the expense of little power consumption.more » « less