Cussens, James; Zhang, Kun
(Ed.)
We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $$t$$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $$\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$$ for a horizon $$T$$, number of base elements $$n$$, and cardinality constraint $$k$$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods.
more »
« less