skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling
Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently.  more » « less
Award ID(s):
2332800 2127721
PAR ID:
10600086
Author(s) / Creator(s):
;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400705212
Page Range / eLocation ID:
71 to 80
Format(s):
Medium: X
Location:
Athens Greece
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We study how to schedule data sources in a wireless time-sensitive information system with multiple heterogeneous and unreliable channels to minimize the total expected Age-of-Information (AoI). Although one could formulate this problem as a discrete-time Markov Decision Process (MDP), such an approach suffers from the curse of dimensionality and lack of insights. For single-channel systems, prior studies have developed lower-complexity solutions based on the Whittle index. However, Whittle index has not been studied for systems with multiple heterogeneous channels, mainly because indexability is not well defined when there are multiple dual cost values, one for each channel. To overcome this difficulty, we introduce new notions of partial indexability and partial index, which are defined with respect to one channel's cost, given all other channels' costs. We then combine the ideas of partial indices and max-weight matching to develop a Sum Weighted Index Matching (SWIM) policy, which iteratively updates the dual costs and partial indices. The proposed policy is shown to be asymptotically optimal in minimizing the total expected AoI, under a technical condition on a global attractor property. Extensive performance simulations demonstrate that the proposed policy offers significant gains over conventional approaches by achieving a near-optimal AoI. Further, the notion of partial index is of independent interest and could be useful for other problems with multiple heterogeneous resources. 
    more » « less
  2. We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems. 
    more » « less
  3. This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making. This CRB framework incorporates the core features of contextual bandits and restless bandits, so that it can model both the internal state transitions of each arm and the influence of external global environmental contexts. Using the dual decomposition method, we develop a scalable index policy algorithm for solving the CRB problem, and theoretically analyze the asymptotical optimality of this algorithm. In the case when the arm models are unknown, we further propose a model-based online learning algorithm based on the index policy to learn the arm models and make decisions simultaneously. Furthermore, we apply the proposed CRB framework and the index policy algorithm specifically to the demand response decision-making problem in smart grids. The numerical simulations demonstrate the performance and efficiency of our proposed CRB approaches. 
    more » « less
  4. null (Ed.)
    The sixth-generation (6G) of wireless communications systems will significantly rely on fog/edge network architectures for service provisioning. To realize this vision, AI-based fog/edge enabled reinforcement solutions are needed to serve highly stringent applications using dynamically varying resources. In this paper, we propose a cognitive dynamic fog/edge network where primary nodes (PNs) temporarily share their resources and act as fog nodes (FNs) for secondary nodes (SNs). Under this architecture, that unleashes multiple access opportunities, we design distributed fog probing schemes for SNs to search for available connections to access neighbouring FNs. Since the availability of these connections varies in time, we develop strategies to enhance the robustness to the uncertain availability of channels and fog nodes, and reinforce the connections with the FNs. A robustness control optimization is formulated with the aim to maximize the expected total long-term reliability of SNs' transmissions. The problem is solved by an online robustness control (ORC) algorithm that involves online fog probing and an index-based connectivity activation policy derived from restless multi-armed bandits (RMABs) model. Simulation results show that our ORC scheme significantly improves the network robustness, the connectivity reliability and the number of completed transmissions. In addition, by activating the connections with higher indexes, the total long-term reliability optimization problem is solved with low complexity. 
    more » « less
  5. Current wireless networks employ sophisticated multi-user transmission techniques to fully utilize the physical layer resources for data transmission. At the MAC layer, these techniques rely on a semi-static map that translates the channel quality of users to the potential transmission rate (more precisely, a map from the Channel Quality Index to the Modulation and Coding Scheme) for user selection and scheduling decisions. However, such a static map does not adapt to the actual deployment scenario and can lead to large performance losses. Furthermore, adaptively learning this map can be inefficient, particularly when there are a large number of users. In this work, we make this learning efficient by clustering users. Specifically, we develop an online learning approach that jointly clusters users and channel-states, and learns the associated rate regions of each cluster. This approach generates a scenario-specific map that replaces the static map that is currently used in practice. Furthermore, we show that our learning algorithm achieves sub- linear regret when compared to an omniscient genie. Next, we develop a user selection algorithm for multi-user scheduling using the learned user-clusters and associated rate regions. Our algorithms are validated on the WiNGS simulator from AT&T Labs, that implements the PHY/MAC stack and simulates the channel. We show that our algorithm can efficiently learn user clusters and the rate regions associated with the user sets for any observed channel state. Moreover, our simulations show that a deployment-scenario-specific map significantly outperforms the current static map approach for resource allocation at the MAC layer. 
    more » « less