Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

Zhang, S; B, Liu; Yao, H; Whiteson, S.

Citation Details

We present the first provably convergent two timescale off-policy actor-critic algorithm (COFPAC) with function approximation. Key to COFPAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear, and the actor can be nonlinear. more »

Award ID(s):: 1910794

PAR ID:: 10169403

Author(s) / Creator(s):: Zhang, S; B, Liu; Yao, H; Whiteson, S.

Date Published:: 2020-07-01

Journal Name:: International Conference on Machine Learning

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this