No-Regret Linear Bandits beyond Realizability

Liu, Chong and; Yin, Ming; Wang, Yu-Xiang

Citation Details

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $$\epsilon$$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $$\epsilon > 0$$. We describe a more natural model of misspecification which only requires the approximation error at each input $$x$$ to be proportional to the suboptimality gap at $$x$$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm — designed for the realizable case — is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $$\sqrt{T}$$ regret for problems that the best-known regret is almost linear in time horizon $$T$$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself. more »

Award ID(s):: 2007117 2003257

PAR ID:: 10466938

Author(s) / Creator(s):: Liu, Chong and; Yin, Ming; Wang, Yu-Xiang

Editor(s):: Evans, Robin J.; Shpitser, Ilya

Publisher / Repository:: UAI 2023

Date Published:: 2023-07-31

Journal Name:: Proceedings of Machine Learning Research

Volume:: 216

ISSN:: 2640-3498

Page Range / eLocation ID:: 1294--1303

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
The DOI is not currently available.

More Like this