Fast multi-agent temporal-difference learning via homotopy stochastic primal-dual method

Ding, D.; Wei, X.; Yang, Z.; Wang, Z.; Jovanovic, M. R.

Citation Details

We study a distributed policy evaluation problem in which a group of agents with jointly observed states and private local actions and rewards collaborate to learn the value function of a given policy via local computation and communication. This problem arises in various large-scale multi-agent systems, including power grids, intelligent transportation systems, wireless sensor networks, and multi-agent robotics. We develop and analyze a new distributed temporal-difference learning algorithm that minimizes the mean-square projected Bellman error. Our approach is based on a stochastic primal-dual method and we improve the best-known convergence rate from $$O(1/\sqrt{T})$$ to $O(1/T)$, where $$T$$ is the total number of iterations. Our analysis explicitly takes into account the Markovian nature of the sampling and addresses a broader class of problems than the commonly-used i.i.d. sampling scenario. more »

Award ID(s):: 1708906 1809833

PAR ID:: 10128755

Author(s) / Creator(s):: Ding, D.; Wei, X.; Yang, Z.; Wang, Z.; Jovanovic, M. R.

Date Published:: 2019-12-01

Journal Name:: Optimization Foundations for Reinforcement Learning Workshop, 33rd Conference on Neural Information Processing Systems

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this