skip to main content


Title: Brief Announcement: Certified Multiplicative Weights Update: Verified Learning Without Regret
The Multiplicative Weights Update method (MWU) is a simple yet powerful algorithm for learning linear classifiers, for ensemble learning a la boosting, for approximately solving linear and semidefinite systems, for computing approximate solutions to multicommodity flow problems, and for online convex optimization, among other applications. In this brief announcement, we apply techniques from interactive theorem proving to define and prove correct the first formally verified implementation of MWU (specifically, we show that our MWU is no regret). Our primary application -- and one justification of the relevance of our work to the PODC community -- is to verified multi-agent systems, such as distributed multi-agent network flow and load balancing games, for which verified MWU provides a convenient method for distributed computation of approximate Coarse Correlated Equilibria.  more » « less
Award ID(s):
1657358
PAR ID:
10050158
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
PODC '17: Proceedings of the ACM Symposium on Principles of Distributed Computing
Page Range / eLocation ID:
459 to 461
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies distributed Q-learning for Linear Quadratic Regulator (LQR) in a multi-agent network. The existing results often assume that agents can observe the global system state, which may be infeasible in large-scale systems due to privacy concerns or communication constraints. In this work, we consider a setting with unknown system models and no centralized coordinator. We devise a state tracking (ST) based Q-learning algorithm to design optimal controllers for agents. Specifically, we assume that agents maintain local estimates of the global state based on their local information and communications with neighbors. At each step, every agent updates its local global state estimation, based on which it solves an approximate Q-factor locally through policy iteration. Assuming a decaying injected excitation noise during the policy evaluation, we prove that the local estimation converges to the true global state, and establish the convergence of the proposed distributed ST-based Q-learning algorithm. The experimental studies corroborate our theoretical results by showing that our proposed method achieves comparable performance with the centralized case. 
    more » « less
  2. We study a distributed policy evaluation problem in which a group of agents with jointly observed states and private local actions and rewards collaborate to learn the value function of a given policy via local computation and communication. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. We develop and analyze a new distributed temporal-difference learning algorithm that minimizes the mean-square projected Bellman error. Our approach is based on a stochastic primal-dual method and we improve the best-known convergence rate from $O(1/\sqrt{T})$ to $O(1/T)$, where $T$ is the total number of iterations. Our analysis explicitly takes into account the Markovian nature of the sampling and addresses a broader class of problems than the commonly-used i.i.d. sampling scenario. 
    more » « less
  3. In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme. 
    more » « less
  4. We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. MDPGT provably achieves the best available sample complexity of O(N -1 e -3) for converging to an e-stationary point of the global average of N local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance e is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings. 
    more » « less
  5. This paper describes Distributed MASON, a distributed version of the MASON agent-based simulation tool. Distributed MASON is architected to take advantage of well known principles from Parallel and Discrete Event Simulation, such as the use of Logical Processes (LP) as a method for obtaining scalable and high performing simulation systems. We first explain data management and sharing between LPs and describe our approach to load balancing. We then present both a local greedy approach and a global hierarchical approach. Finally, we present the results of our implementation of Distributed MASON on an instance in the Amazon Cloud, using several standard multi-agent models. The results indicate that our design is highly scalable and achieves our expected levels of speed-up. 
    more » « less