TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Baek, Jackie; Farias, Vivek F

Citation Details

Thompson sampling has become a ubiquitous ap- proach to online decision problems with bandit feedback. The key algorithmic task for Thomp- son sampling is drawing a sample from the pos- terior of the optimal action. We propose an al- ternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance im- provements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. TS-UCB achieves materially lower regret on a comprehen- sive suite of synthetic and real-world datasets, including a personalized article recommendation dataset from Yahoo! and a suite of benchmark datasets from a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoreti- cal perspective, we establish optimal regret guar- antees for TS-UCB for both the K-armed and linear bandit models. more »

Award ID(s):: 1727239

PAR ID:: 10584907

Author(s) / Creator(s):: Baek, Jackie; Farias, Vivek F

Publisher / Repository:: PMLR

Date Published:: 2023-04-11

ISSN:: 2640-3498

Format(s):: Medium: X

Location:: International Conference on Artificial Intelligence and Statistics. PMLR, 2023

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this