Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

Sharma, Nihal; Sen, Rajat; Basu, Soumya; Shanmugam, Karthikeyan; Shakkottai, Sanjay

doi:10.1145/3680279

Citation Details

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a fixed conditional distribution. The agent seeks to remain competitive with the “best” among the given set of experts. We propose the Divergence-based Upper Confidence Bound (D-UCB) algorithm that uses importance sampling to share information across experts and provide horizon-independent constant regret bounds that only scale linearly in the number of experts. We also provide the Empirical D-UCB (ED-UCB) algorithm that can function with only approximate knowledge of expert distributions. Further, we investigate the episodic setting where the agent interacts with an environment that changes over episodes. Each episode can have different context and reward distributions resulting in the best expert changing across episodes. We show that by bootstrapping from\(\mathcal {O}(N\log (NT^2\sqrt {E}))\)samples, ED-UCB guarantees a regret that scales as\(\mathcal {O}(E(N+1) + \frac{N\sqrt {E}}{T^2})\)forNexperts overEepisodes, each of lengthT. We finally empirically validate our findings through simulations. more »

Award ID(s):: 2107037

PAR ID:: 10567978

Author(s) / Creator(s):: Sharma, Nihal; Sen, Rajat; Basu, Soumya; Shanmugam, Karthikeyan; Shakkottai, Sanjay

Publisher / Repository:: ACM

Date Published:: 2024-09-30

Journal Name:: ACM Transactions on Modeling and Performance Evaluation of Computing Systems

Volume:: 9

Issue:: 3

ISSN:: 2376-3639

Page Range / eLocation ID:: 1 to 33

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3680279

More Like this