skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust Multi-Agent Bandits Over Undirected Graphs
We consider a multi-agent multi-armed bandit setting in which n honest agents collaborate over a network to minimize regret but m malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur O((m + K/n) łog (T) / Δ ) regret in this setting, where K is the number of arms and Δ is the arm gap. For m łl K, this improves over the single-agent baseline regret of O(Kłog(T)/Δ). In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in K and n . In light of this negative result, we propose a new algorithm for which the i -th agent has regret O(( dmal (i) + K/n) łog(T)/Δ) on any connected and undirected graph, where dmal(i) is the number of i 's neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where dmal(i) = m), and show the effect of malicious agents is entirely local (in the sense that only the dmal (i) malicious agents directly connected to i affect its long-term regret).  more » « less
Award ID(s):
2207547 2106801 1934986
PAR ID:
10410558
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Volume:
6
Issue:
3
ISSN:
2476-1249
Page Range / eLocation ID:
1 to 57
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm \textttGossip\_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that \textttGossip\_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(\max\ \textttpoly (N,M) łog T, \textttpoly (N,M)łog_łambda_2^-1 N\ ) for all N agents, where łambda_2\in(0,1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose \textttFed\_UCB, a differentially private version of \textttGossip\_UCB, in which the agents preserve ε-differential privacy of their local data while achieving O(\max \\frac\textttpoly (N,M) ε łog^2.5 T, \textttpoly (N,M) (łog_łambda_2^-1 N + łog T) \ ) regret. 
    more » « less
  2. We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of O√(K/N + N)T, communicates for O(log T) steps and broadcasts O(log K) bits in each communication step. We extend the work to sparse graphs with maximum degree KG and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of O√(D(K/N + KG)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node. 
    more » « less
  3. Gørtz, Inge Li; Farach-Colton, Martin; Puglisi, Simon J; Herman, Grzegorz (Ed.)
    We give the first almost-linear time algorithm for computing the maximal k-edge-connected subgraphs of an undirected unweighted graph for any constant k. More specifically, given an n-vertex m-edge graph G = (V,E) and a number k = log^o(1) n, we can deterministically compute in O(m+n^{1+o(1)}) time the unique vertex partition {V_1,… ,V_z} such that, for every i, V_i induces a k-edge-connected subgraph while every superset V'_i ⊃ V_{i} does not. Previous algorithms with linear time work only when k ≤ 2 [Tarjan SICOMP'72], otherwise they all require Ω(m+n√n) time even when k = 3 [Chechik et al. SODA'17; Forster et al. SODA'20]. Our algorithm also extends to the decremental graph setting; we can deterministically maintain the maximal k-edge-connected subgraphs of a graph undergoing edge deletions in m^{1+o(1)} total update time. Our key idea is a reduction to the dynamic algorithm supporting pairwise k-edge-connectivity queries [Jin and Sun FOCS'20]. 
    more » « less
  4. We study a hinted heterogeneous multi-agent multi-armed bandits problem (HMA2B), where agents can query low-cost observations (hints) in addition to pulling arms. In this framework, each of the M agents has a unique reward distribution over K arms, and in T rounds, they can observe the reward of the arm they pull only if no other agent pulls that arm. The goal is to maximize the total utility by querying the minimal necessary hints without pulling arms, achieving time-independent regret. We study HMA2B in both centralized and decentralized setups. Our main centralized algorithm, GP-HCLA, which is an extension of HCLA, uses a central decision-maker for arm-pulling and hint queries, achieving O(M^4 K) regret with O(M K log T) adaptive hints. In decentralized setups, we propose two algorithms, HD-ETC and EBHD-ETC, that allow agents to choose actions independently through collision-based communication and query hints uniformly until stopping, yielding O(M^3 K^2) regret with O(M^3 K log T) hints, where the former requires knowledge of the minimum gap and the latter does not. Finally, we establish lower bounds to prove the optimality of our results and verify them through numerical simulations. 
    more » « less
  5. We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from classic (non-strategic) online classification. For instance, whereas in the non-strategic case, a mistake bound of ln |H| is achievable via the halving algorithm when the target function belongs to a known class H, we show that no deterministic algorithm can achieve a mistake bound o(Δ) in the strategic setting, where Δ is the maximum degree of the manipulation graph (even when |H| = O(Δ)). We complement this with a general algorithm achieving mistake bound O(Δ ln |H|). We also extend this to the agnostic setting, and show that this algorithm achieves a Δ multiplicative regret (mistake bound of O(Δ · OPT + Δ · ln |H|)), and that no deterministic algorithm can achieve o(Δ) multiplicative regret. Next, we study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences. In the first, fractional model, at each round the learner deterministically chooses a probability distribution over classifiers inducing expected values on each vertex (probabilities of being classified as positive), which the strategic agents respond to. We show that any learner in this model has to suffer linear regret. On the other hand, in the second randomized algorithms model, while the adversary who selects the next agent must respond to the learner's probability distribution over classifiers, the agent then responds to the actual hypothesis classifier drawn from this distribution. Surprisingly, we show this model is more advantageous to the learner, and we design randomized algorithms that achieve sublinear regret bounds against both oblivious and adaptive adversaries. 
    more » « less