 Editors:
 Cussens, James; Zhang, Kun
 Award ID(s):
 2149617
 Publication Date:
 NSFPAR ID:
 10397867
 Journal Name:
 Uncertainty in Artificial Intelligence
 Volume:
 180
 Page Range or eLocationID:
 15411551
 ISSN:
 15253384
 Sponsoring Org:
 National Science Foundation
More Like this

We investigate the problem of unconstrained combinatorial multiarmed bandits with fullbandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $\frac{1}{2}$regret upper bound of $\Tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon $T$ and number of arms $n$. We also show in experiments that RGL empirically outperforms other fullbandit variants in submodular and nonsubmodular settings.

Feldman, Vitaly ; Ligett, Katrina ; Sabato, Sivan (Ed.)Many realworld problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient tradeoff between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a nonlinear function of the chosen $K$ arms. The direct use of multiarmed bandit requires choosing among $N$choose$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divideandconquer based strategy, that we call CMABSM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sublinear} in all parameters $T$, $N$, and $K$.

Abstract Given a sequence $\{Z_d\}_{d\in \mathbb{N}}$ of smooth and compact hypersurfaces in ${\mathbb{R}}^{n1}$, we prove that (up to extracting subsequences) there exists a regular definable hypersurface $\Gamma \subset {\mathbb{R}}\textrm{P}^n$ such that each manifold $Z_d$ is diffeomorphic to a component of the zero set on $\Gamma$ of some polynomial of degree $d$. (This is in sharp contrast with the case when $\Gamma$ is semialgebraic, where for example the homological complexity of the zero set of a polynomial $p$ on $\Gamma$ is bounded by a polynomial in $\deg (p)$.) More precisely, given the above sequence of hypersurfaces, we construct a regular, compact, semianalytic hypersurface $\Gamma \subset {\mathbb{R}}\textrm{P}^{n}$ containing a subset $D$ homeomorphic to a disk, and a family of polynomials $\{p_m\}_{m\in \mathbb{N}}$ of degree $\deg (p_m)=d_m$ such that $(D, Z(p_m)\cap D)\sim ({\mathbb{R}}^{n1}, Z_{d_m}),$ i.e. the zero set of $p_m$ in $D$ is isotopic to $Z_{d_m}$ in ${\mathbb{R}}^{n1}$. This says that, up to extracting subsequences, the intersection of $\Gamma$ with a hypersurface of degree $d$ can be as complicated as we want. We call these ‘pathological examples’. In particular, we show that for every $0 \leq k \leq n2$ and every sequence of natural numbers $a=\{a_d\}_{d\in \mathbb{N}}$ there is a regular, compact semianalyticmore »

We investigate the problem of unconstrained combinatorial multiarmed bandits with fullbandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a 1 2 regret upper bound of O˜(nT 2 3 ) for horizon T and number of arms n. We also show in experiments that RGL empirically outperforms other fullbandit variants in submodular and nonsubmodular settings.

In this paper, we study Federated Bandit, a decentralized MultiArmed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to suboptimal actions while converging to a noregret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^1 N\ ) for all N agents, wheremore »