skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Federated Bandit: A Gossiping Approach
In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret.  more » « less
Award ID(s):
2007951
PAR ID:
10290969
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Volume:
5
Issue:
1
ISSN:
2476-1249
Page Range / eLocation ID:
1 to 29
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n . In light of this negative result, we propose a new algorithm for which the i -th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i 's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret). 
    more » « less
  2. We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of O√(K/N + N)T, communicates for O(log T) steps and broadcasts O(log K) bits in each communication step. We extend the work to sparse graphs with maximum degree KG and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of O√(D(K/N + KG)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node. 
    more » « less
  3. Estimating the ε-approximate quantiles or ranks of a stream is a fundamental task in data monitoring. Given a stream x_1,..., x_n from a universe \mathcalU with total order, an additive-error quantile sketch \mathcalM allows us to approximate the rank of any query y\in \mathcalU up to additive ε n error. In 2001, Greenwald and Khanna gave a deterministic algorithm (GK sketch) that solves the ε-approximate quantiles estimation problem using O(ε^-1 łog(ε n)) space \citegreenwald2001space ; recently, this algorithm was shown to be optimal by Cormode and Vesleý in 2020 \citecormode2020tight. However, due to the intricacy of the GK sketch and its analysis, over-simplified versions of the algorithm are implemented in practical applications, often without any known theoretical guarantees. In fact, it has remained an open question whether the GK sketch can be simplified while maintaining the optimal space bound. In this paper, we resolve this open question by giving a simplified deterministic algorithm that stores at most (2 + o(1))ε^-1 łog (ε n) elements and solves the additive-error quantile estimation problem; as a side benefit, our algorithm achieves a smaller constant factor than the \frac11 2 ε^-1 łog(ε n) space bound in the original GK sketch~\citegreenwald2001space. Our algorithm features an easier analysis and still achieves the same optimal asymptotic space complexity as the original GK sketch. Lastly, our simplification enables an efficient data structure implementation, with a worst-case runtime of O(łog(1/ε) + łog łog (ε n)) per-element for the ordinary ε-approximate quantile estimation problem. Also, for the related weighted'' quantile estimation problem, we give efficient data structures for our simplified algorithm which guarantee a worst-case per-element runtime of O(łog(1/ε) + łog łog (ε W_n/w_\textrmmin )), achieving an improvement over the previous upper bound of \citeassadi2023generalizing. 
    more » « less
  4. We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the M agents has a unique reward distribution over K arms, and in T rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret. We study HMA2B in both centralized and decentralized setups. Our main centralized algorithm, GP-HCLA, which is an extension of HCLA, uses a central decision-maker for arm-pulling and hint queries, achieving O(M^4 K) regret with O(M K log T) adaptive hints. In decentralized setups, we propose two algorithms, HD-ETC and EBHD-ETC, that allow agents to choose actions independently through collision-based communication and query hints uniformly until stopping, yielding O(M^3 K^2) regret with O(M^3 K log T) hints, where the former requires knowledge of the minimum gap and the latter does not. Finally, we establish lower bounds to prove the optimality of our results and verify them through numerical simulations. 
    more » « less
  5. We consider a linear stochastic bandit problem involving M agents that can collaborate via a central server to minimize regret. A fraction α of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. In this work, we provide a fundamental understanding of this tension by designing new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. We also complement our algorithms with tight analyses. First, we develop a robust collaborative phased elimination algorithm that achieves O(alpha(1 + 1/sqrt(M))sqrt(dt)) regret for each good agent; here, d is the model-dimension and T is the horizon. For small α, our result thus reveals a clear benefit of collaboration despite adversaries. Using an informationtheoretic argument, we then prove a matching lower bound, thereby providing the first set of tight, near-optimal regret bounds for collaborative linear bandits with adversaries. Furthermore, by leveraging recent advances in high-dimensional robust statistics, we significantly extend our algorithmic ideas and results to (i) the generalized linear bandit model that allows for non-linear observation maps; and (ii) the contextual bandit setting that allows for time-varying feature vectors. 
    more » « less