Bandit Online Linear Optimization with Hints and Queries

Bhaskara, Aditya; Cutkosky, Ashok; Kumar, Ravi; Purohit, Manish

Citation Details

We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of O(\sqrt{T}) by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show O(\log(T)) regret is achievable. We then show how to make this result more robust — when some of the query responses can be adversarial — by using a little feedback on the quality of the responses. more »

Award ID(s):: 2211718

PAR ID:: 10486613

Author(s) / Creator(s):: Bhaskara, Aditya; Cutkosky, Ashok; Kumar, Ravi; Purohit, Manish

Publisher / Repository:: ICML

Date Published:: 2023-07-23

Journal Name:: Proceedings of Machine Learning Research

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Conference Paper:
The DOI is not currently available.

More Like this