An explore-then-commit algorithm for submodular maximization under full-bandit feedback

Nie, Guanyu; Agarwal, Mridul; Umrawal, Abhishek Kumar; Aggarwal, Vaneet; Quinn, Christopher John

Citation Details

We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $t$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$ for a horizon $T$, number of base elements $n$, and cardinality constraint $k$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods. more »

Award ID(s):: 2149617

NSF-PAR ID:: 10397867

Author(s) / Creator(s):: Nie, Guanyu; Agarwal, Mridul; Umrawal, Abhishek Kumar; Aggarwal, Vaneet; Quinn, Christopher John

Editor(s):: Cussens, James; Zhang, Kun

Date Published:: 2022-01-01

Journal Name:: Uncertainty in Artificial Intelligence

Volume:: 180

ISSN:: 1525-3384

Page Range / eLocation ID:: 1541-1551

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this