Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Zamir, Nida; Hou, I-Hong

doi:10.1145/3641512.3686381

Citation Details

Deep Index Policy for Multi-Resource Restless Matching Bandit and Its Application in Multi-Channel Scheduling

Scheduling in multi-channel wireless communication system presents formidable challenges in effectively allocating resources. To address these challenges, we investigate a multi-resource restless matching bandit (MR-RMB) model for heterogeneous resource systems with an objective of maximizing long-term discounted total rewards while respecting resource constraints. We have also generalized to applications beyond multi-channel wireless. We discuss the Max-Weight Index Matching algorithm, which optimizes resource allocation based on learned partial indexes. We have derived the policy gradient theorem for index learning. Our main contribution is the introduction of a new Deep Index Policy (DIP), an online learning algorithm tailored for MR-RMB. DIP learns the partial index by leveraging the policy gradient theorem for restless arms with convoluted and unknown transition kernels of heterogeneous resources. We demonstrate the utility of DIP by evaluating its performance for three different MR-RMB problems. Our simulation results show that DIP indeed learns the partial indexes efficiently. more »

Award ID(s):: 2332800 2127721

PAR ID:: 10600086

Author(s) / Creator(s):: Zamir, Nida; Hou, I-Hong

Publisher / Repository:: ACM

Date Published:: 2024-10-01

ISBN:: 9798400705212

Page Range / eLocation ID:: 71 to 80

Format(s):: Medium: X

Location:: Athens Greece

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3641512.3686381

More Like this