Multi-Feedback Bandit Learning with Probabilistic Contexts

Yang, Luting; Yang, Jianyi; Ren, Shaolei

doi:10.24963/ijcai.2020/427

Citation Details

Multi-Feedback Bandit Learning with Probabilistic Contexts

Contextual bandit is a classic multi-armed bandit setting, where side information (i.e., context) is available before arm selection. A standard assumption is that exact contexts are perfectly known prior to arm selection and only single feedback is returned. In this work, we focus on multi-feedback bandit learning with probabilistic contexts, where a bundle of contexts are revealed to the agent along with their corresponding probabilities at the beginning of each round. This models such scenarios as where contexts are drawn from the probability output of a neural network and the reward function is jointly determined by multiple feedback signals. We propose a kernelized learning algorithm based on upper confidence bound to choose the optimal arm in reproducing kernel Hilbert space for each context bundle. Moreover, we theoretically establish an upper bound on the cumulative regret with respect to an oracle that knows the optimal arm given probabilistic contexts, and show that the bound grows sublinearly with time. Our simula- tion on machine learning model recommendation further validates the sub-linearity of our cumulative regret and demonstrates that our algorithm outper- forms the approach that selects arms based on the most probable context. more »

Award ID(s):: 1910208 1610471 1551661

PAR ID:: 10195214

Author(s) / Creator(s):: Yang, Luting; Yang, Jianyi; Ren, Shaolei

Date Published:: 2020-07-01

Journal Name:: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence Main track

Page Range / eLocation ID:: 3087 to 3093

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.24963/ijcai.2020/427

More Like this