Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Zhang, Kaiqing; Kakade, Sham; Basar, Tamer; Yang, Lin

Citation Details

Model-based reinforcement learning (RL), which finds an optimal policy using an empirical model, has long been recognized as one of the cornerstones of RL. It is especially suitable for multi-agent RL (MARL), as it naturally decouples the learning and the planning phases, and avoids the non-stationarity problem when all agents are improving their policies simultaneously using samples. Though intuitive and widely-used, the sample complexity of model-based MARL algorithms has been investigated relatively much less often. In this paper, we aim to address the fundamental open question about the sample complexity of model-based MARL. We study arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model of state transition. We show that model-based MARL achieves a near optimal sample complexity for finding the Nash equilibrium (NE) \emph{value} up to some additive error. We also show that this method is near-minimax optimal with a tight dependence on the horizon and the number of states. Our results justify the efficiency of this simple model-based approach in the multi-agent RL setting. more »

Award ID(s):: 1703574

PAR ID:: 10276099

Author(s) / Creator(s):: Zhang, Kaiqing; Kakade, Sham; Basar, Tamer; Yang, Lin

Date Published:: 2020-01-01

Journal Name:: Advances in neural information processing systems

Volume:: 33

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this