- Editors:
- Cussens, James; Zhang, Kun
- Award ID(s):
- 2149617
- Publication Date:
- NSF-PAR ID:
- 10397867
- Journal Name:
- Uncertainty in Artificial Intelligence
- Volume:
- 180
- Page Range or eLocation-ID:
- 1541-1551
- ISSN:
- 1525-3384
- Sponsoring Org:
- National Science Foundation
More Like this
-
We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a $\frac{1}{2}$-regret upper bound of $\Tilde{\mathcal{O}}(n T^{\frac{2}{3}})$ for horizon $T$ and number of arms $n$. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
-
Feldman, Vitaly ; Ligett, Katrina ; Sabato, Sivan (Ed.)Many real-world problems like Social Influence Maximization face the dilemma of choosing the best $K$ out of $N$ options at a given time instant. This setup can be modeled as a combinatorial bandit which chooses $K$ out of $N$ arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. This is the first work for combinatorial bandits where the feedback received can be a non-linear function of the chosen $K$ arms. The direct use of multi-armed bandit requires choosing among $N$-choose-$K$ options making the state space large. In this paper, we present a novel algorithm which is computationally efficient and the storage is linear in $N$. The proposed algorithm is a divide-and-conquer based strategy, that we call CMAB-SM. Further, the proposed algorithm achieves a \textit{regret bound} of $\tilde O(K^{\frac{1}{2}}N^{\frac{1}{3}}T^{\frac{2}{3}})$ for a time horizon $T$, which is \textit{sub-linear} in all parameters $T$, $N$, and $K$.
-
Abstract Given a sequence $\{Z_d\}_{d\in \mathbb{N}}$ of smooth and compact hypersurfaces in ${\mathbb{R}}^{n-1}$, we prove that (up to extracting subsequences) there exists a regular definable hypersurface $\Gamma \subset {\mathbb{R}}\textrm{P}^n$ such that each manifold $Z_d$ is diffeomorphic to a component of the zero set on $\Gamma$ of some polynomial of degree $d$. (This is in sharp contrast with the case when $\Gamma$ is semialgebraic, where for example the homological complexity of the zero set of a polynomial $p$ on $\Gamma$ is bounded by a polynomial in $\deg (p)$.) More precisely, given the above sequence of hypersurfaces, we construct a regular, compact, semianalytic hypersurface $\Gamma \subset {\mathbb{R}}\textrm{P}^{n}$ containing a subset $D$ homeomorphic to a disk, and a family of polynomials $\{p_m\}_{m\in \mathbb{N}}$ of degree $\deg (p_m)=d_m$ such that $(D, Z(p_m)\cap D)\sim ({\mathbb{R}}^{n-1}, Z_{d_m}),$ i.e. the zero set of $p_m$ in $D$ is isotopic to $Z_{d_m}$ in ${\mathbb{R}}^{n-1}$. This says that, up to extracting subsequences, the intersection of $\Gamma$ with a hypersurface of degree $d$ can be as complicated as we want. We call these ‘pathological examples’. In particular, we show that for every $0 \leq k \leq n-2$ and every sequence of natural numbers $a=\{a_d\}_{d\in \mathbb{N}}$ there is a regular, compact semianalyticmore »
-
We investigate the problem of unconstrained combinatorial multi-armed bandits with fullbandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a 1 2 -regret upper bound of O˜(nT 2 3 ) for horizon T and number of arms n. We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
-
In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, wheremore »