%AWu, Huasen%AGuo, Xueying%ALiu, Xin%D2018%I
%K
%MOSTI ID: 10072465
%PMedium: X
%TAdaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
%XIn this paper, we propose and study opportunistic
bandits - a new variant of bandits where the
regret of pulling a suboptimal arm varies under
different environmental conditions, such as network
load or produce price. When the load/price
is low, so is the cost/regret of pulling a suboptimal
arm (e.g., trying a suboptimal network
configuration). Therefore, intuitively, we could
explore more when the load/price is low and
exploit more when the load/price is high. Inspired
by this intuition, we propose an Adaptive
Upper-Confidence-Bound (AdaUCB) algorithm
to adaptively balance the exploration-exploitation
tradeoff for opportunistic bandits. We prove that
AdaUCB achieves O(log T) regret with a smaller
coefficient than the traditional UCB algorithm.
Furthermore, AdaUCB achieves O(1) regret with
respect to T if the exploration cost is zero when
the load level is below a certain threshold. Last,
based on both synthetic data and real-world traces,
experimental results show that AdaUCB significantly
outperforms other bandit algorithms, such
as UCB and TS (Thompson Sampling), under
large load/price fluctuation.
Country unknown/Code not availableOSTI-MSA