The Projected Bellman Equation in Reinforcement Learning

Meyn, Sean

doi:10.1109/TAC.2024.3409647

Citation Details

The Projected Bellman Equation in Reinforcement Learning

Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby obtain the optimal policy for a Markov Decision Process. The goal today is more modest: obtain an approximate solution within a prescribed function class. The standard algorithms are based on the same architecture as formulated in the 1980s, with the goal of finding a value function approximation that solves the so-called projected Bellman equation. While reinforcement learning has been an active research area for over four decades, there is little theory providing conditions for convergence of these Q-learning algorithms, or even existence of a solution to this equation. The purpose of this paper is to show that a solution to the projected Bellman equation does exist, provided the function class is linear and the input used for training is a form of epsilon-greedy policy with sufficiently small epsilon. Moreover, under these conditions it is shown that the Q-learning algorithm is stable, in terms of bounded parameter estimates. Convergence remains one of many open topics for research. more »

Award ID(s):: 2306023

PAR ID:: 10521316

Author(s) / Creator(s):: Meyn, Sean

Editor(s):: Astolfi, Alessandro

Publisher / Repository:: IEEE transactions on automatic control

Date Published:: 2024-01-01

Journal Name:: IEEE Transactions on Automatic Control

ISSN:: 0018-9286

Page Range / eLocation ID:: 1 to 14

Subject(s) / Keyword(s):: Reinforcement learning stochastic approximation stochastic control

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/TAC.2024.3409647

More Like this