skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An Asynchronous Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning
This paper studies a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors. An asynchronous multi-agent actor-critic algorithm is proposed for possibly unidirectional communication relationships depicted by a directed graph. Each agent independently updates its variables at “event times” determined by its own clock. It is not assumed that the agents’ clocks are synchronized or that the event times are evenly spaced. It is shown that the algorithm can solve the problem for any strongly connected graph in the presence of communication and computation delays.  more » « less
Award ID(s):
1749937
PAR ID:
10132943
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
NeurIPS Optimization Foundations for Reinforcement Learning Workshop
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Intelligent utilization of resources and improved mission performance in an autonomous agent require consideration of cyber and physical resources. The allocation of these resources becomes more complex when the system expands from one agent to multiple agents, and the control shifts from centralized to decentralized. Consensus is a distributed algorithm that lets multiple agents agree on a shared value, but typically does not leverage mobility. We propose a coupled consensus control strategy that co-regulates computation, communication frequency, and connectivity of the agents to achieve faster convergence times at lower communication rates and computational costs. In this strategy, agents move towards a common location to increase connectivity. Simultaneously, the communication frequency is increased when the shared state error between an agent and its connected neighbors is high. When the shared state converges (i.e., consensus is reached), the agents withdraw to the initial positions and the communication frequency is decreased. Convergence properties of our algorithm are demonstrated under the proposed co-regulated control algorithm. We evaluated the proposed approach through a new set of cyber-physical, multi-agent metrics and demonstrated our approach in a simulation of unmanned aircraft systems measuring temperatures at multiple sites. The results demonstrate that, compared with fixed-rate and event-triggered consensus algorithms, our co-regulation scheme can achieve improved performance with fewer resources, while maintaining high reactivity to changes in the environment and system. 
    more » « less
  2. We consider the problem where N agents collaboratively interact with an instance of a stochastic K arm bandit problem for K N. The agents aim to simultaneously minimize the cumulative regret over all the agents for a total of T time steps, the number of communication rounds, and the number of bits in each communication round. We present Limited Communication Collaboration - Upper Confidence Bound (LCC-UCB), a doubling-epoch based algorithm where each agent communicates only after the end of the epoch and shares the index of the best arm it knows. With our algorithm, LCC-UCB, each agent enjoys a regret of O√(K/N + N)T, communicates for O(log T) steps and broadcasts O(log K) bits in each communication step. We extend the work to sparse graphs with maximum degree KG and diameter D to propose LCC-UCB-GRAPH which enjoys a regret bound of O√(D(K/N + KG)DT). Finally, we empirically show that the LCC-UCB and the LCC-UCB-GRAPH algorithms perform well and outperform strategies that communicate through a central node. 
    more » « less
  3. null (Ed.)
    We consider the problem of collective exploration of a known n- node edge-weighted graph by k mobile agents that have limited energy but are capable of energy transfers. The agents are initially placed at an arbitrary subset of nodes in the graph, and each agent has an initial, possibly different, amount of energy. The goal of the exploration problem is for every edge in the graph to be traversed by at least one agent. The amount of energy used by an agent to travel distance x is proportional to x. In our model, the agents can share energy when co-located: when two agents meet, one can transfer part of its energy to the other. For an n-node path, we give an O(n+k) time algorithm that either nds an exploration strategy, or reports that one does not exist. For an n-node tree with l leaves, we give an O(n+lk^2) algorithm that finds an exploration strategy if one exists. Finally, for the general graph case, we show that the problem of deciding if exploration is possible by energy-sharing agents is NP-hard, even for 3-regular graphs. In addition, we show that it is always possible to find an exploration strategy if the total energy of the agents is at least twice the total weight of the edges; moreover, this is asymptotically optimal. 
    more » « less
  4. We study a multi-agent partially observable environment in which autonomous agents aim to coordinate their actions, while also learning the parameters of the unknown environment through repeated interactions. In particular, we focus on the role of communication in a multi-agent reinforcement learning problem. We consider a learning algorithm in which agents make decisions based on their own observations of the environment, as well as the observations of other agents, which are collected through communication between agents. We first identify two potential benefits of this type of information sharing when agents' observation quality is heterogeneous: (1) it can facilitate coordination among agents, and (2) it can enhance the learning of all participants, including the better informed agents. We show however that these benefits of communication depend in general on its timing, so that delayed information sharing may be preferred in certain scenarios. 
    more » « less
  5. This paper considers a planar multi-agent coordination problem. Unlike other related works, we explicitly consider a globally shared wireless communication channel where individual agents must choose both a frequency and power to transmit their messages at. This problem is motivated by the pressing need for algorithms that are able to efficiently and reliably operate on overcrowded wireless networks or otherwise poor-performing RF environments. We develop a self-triggered coordination algorithm that guarantees convergence to the desired set of states with probability 1. The algorithm is developed by using ideas from event/self-triggered coordination and allows agents to autonomously decide for themselves when to broadcast information, at which frequency and power, and how to move based on information received from other agents in the network. Simulations illustrate our results. 
    more » « less