A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

Xu, Evan Yifan; Xu, Pan

doi:10.1613/jair.1.16261

Citation Details

A New Regret-analysis Framework for Budgeted Multi-Armed Bandits

We consider two versions of the (stochastic) budgeted Multi-Armed Bandit problem. The first one was introduced by Tran-Thanh et al. (AAAI, 2012): Pulling each arm incurs a fixed deterministic cost and yields a random reward i.i.d. sampled from an unknown distribution (prior free). We have a global budget B and aim to devise a strategy to maximize the expected total reward. The second one was introduced by Ding et al. (AAAI, 2013): It has the same setting as before except costs of each arm are i.i.d. samples from an unknown distribution (and independent from its rewards). We propose a new budget-based regret-analysis framework and design two simple algorithms to illustrate the power of our framework. Our regret bounds for both problems not only match the optimal bound of O(ln B) but also significantly reduce the dependence on other input parameters (assumed constants), compared with the two studies of Tran-Thanh et al. (AAAI, 2012) and Ding et al. (AAAI, 2013) where both utilized a time-based framework. Extensive experimental results show the effectiveness and computation efficiency of our proposed algorithms and confirm our theoretical predictions. more »

Award ID(s):: 1948157

PAR ID:: 10633868

Author(s) / Creator(s):: Xu, Evan Yifan; Xu, Pan

Publisher / Repository:: Journal of Artificial Intelligence Research

Date Published:: 2025-01-06

Journal Name:: Journal of Artificial Intelligence Research

Volume:: 82

ISSN:: 1076-9757

Page Range / eLocation ID:: 1943 to 1959

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1613/jair.1.16261

More Like this