Stochastic Top-$K$ Subset Bandits with Linear Space and Non-Linear Feedback

Agarwal, Mridul; Aggarwal, Vaneet; Quinn, Christopher J.; Umrawal, Abhishek K.

Citation Details

Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $$K$$ out of $$N$$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $$K$$ out of $$N$$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $$K$$ arms. The direct use of multi-armed bandit requires choosing among $$N$$-choose-$$K$$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $$N$$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $$\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $$T$$, which is \textit{sub-linear} in all parameters $$T$$, $$N$$, and $$K$$. more »

Award ID(s):: 1742847

PAR ID:: 10309321

Author(s) / Creator(s):: Agarwal, Mridul; Aggarwal, Vaneet; Quinn, Christopher J.; Umrawal, Abhishek K.

Editor(s):: Feldman, Vitaly; Ligett, Katrina; Sabato, Sivan

Date Published:: 2021-03-01

Journal Name:: International Conference on Algorithmic Learning Theory

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this