Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

Hu, Yichun; Kallus, Nathan; Mao, Xiaojie

doi:10.1287/opre.2021.2237

Citation Details

Smooth Contextual Bandits: Bridging the Parametric and Nondifferentiable Regret Regimes

We study a nonparametric contextual bandit problem in which the expected reward functions belong to a Hölder class with smoothness parameter β. We show how this interpolates between two extremes that were previously studied in isolation: nondifferentiable bandits (β at most 1), with which rate-optimal regret is achieved by running separate noncontextual bandits in different context regions, and parametric-response bandits (infinite [Formula: see text]), with which rate-optimal regret can be achieved with minimal or no exploration because of infinite extrapolatability. We develop a novel algorithm that carefully adjusts to all smoothness settings, and we prove its regret is rate-optimal by establishing matching upper and lower bounds, recovering the existing results at the two extremes. In this sense, our work bridges the gap between the existing literature on parametric and nondifferentiable contextual bandit problems and between bandit algorithms that exclusively use global or local information, shedding light on the crucial interplay of complexity and regret in contextual bandits. more »

Award ID(s):: 1846210

PAR ID:: 10320785

Author(s) / Creator(s):: Hu, Yichun; Kallus, Nathan; Mao, Xiaojie

Date Published:: 2022-01-01

Journal Name:: Operations Research

ISSN:: 0030-364X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1287/opre.2021.2237

More Like this