Adversarial Bandits with Corruptions

Yang, Lin; Hajiesmaili, Mohammad H; Talebi, Mohammad Sadegh; Lui, John C; Wong, Wing Shing

Citation Details

This paper studies adversarial bandits with corruptions. In the basic adversarial bandit setting, the reward of arms is predetermined by an adversary who is oblivious to the learner’s policy. In this paper, we consider an extended setting in which an attacker sits in-between the environment and the learner, and is endowed with a limited budget to corrupt the reward of the selected arm. We have two main results. First, we derive a lower bound on the regret of any bandit algorithm that is aware of the budget of the attacker. Also, for budget-agnostic algorithms, we characterize an impossibility result demonstrating that even when the attacker has a sublinear budget, i.e., a budget growing sublinearly with time horizon T, they fail to achieve a sublinear regret. Second, we propose ExpRb, a bandit algorithm that incorporates a biased estimator and a robustness parameter to deal with corruption. We characterize the regret of ExpRb as a function of the corruption budget and show that for the case of a known corruption budget, the regret of ExpRb is tight. more »

Award ID(s):: 1908298

PAR ID:: 10296412

Author(s) / Creator(s):: Yang, Lin; Hajiesmaili, Mohammad H; Talebi, Mohammad Sadegh; Lui, John C; Wong, Wing Shing

Date Published:: 2020-12-15

Journal Name:: Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this