On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems

Wei, Lai; Srivastava, Vaibhav

Citation Details

We study the non-stationary stochastic multi- armed bandit (MAB) problem and propose two generic algorithms, namely, Limited Memory Deterministic Sequencing of Exploration and Exploitation (LM-DSEE) and Sliding-Window Upper Confidence Bound# (SW-UCB#). We rigorously analyze these algorithms in abruptly-changing and slowly-varying environments and characterize their performance. We show that the expected cumulative regret for these algorithms in either of the environments is upper bounded by sublinear functions of time, i.e., the time average of the regret asymptotically converges to zero. We complement our analysis with numerical illustrations. more »

Award ID(s):: 1734272

PAR ID:: 10066465

Author(s) / Creator(s):: Wei, Lai; Srivastava, Vaibhav

Date Published:: 2018-06-27

Journal Name:: 2018 Annual American Control Conference

Page Range / eLocation ID:: 6291-6296

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this