Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Devraj, Adithya M.; Meyn, Sean P.

doi:10.1109/TAC.2021.3133184

Citation Details

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. These results seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded over all discount factors. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that these prior bounds concern value function approximation and not policy approximation. We show that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded over all discount factors. more »

Award ID(s):: 1935389

PAR ID:: 10347485

Author(s) / Creator(s):: Devraj, Adithya M.; Meyn, Sean P.

Date Published:: 2021-12-07

Journal Name:: IEEE Transactions on Automatic Control

ISSN:: 0018-9286

Page Range / eLocation ID:: 1 to 1

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/TAC.2021.3133184

More Like this