Zap Q-Learning - A User's Guide

Devraj, Adithya M.; Busic, Ana; Meyn, Sean

doi:10.1109/INDIANCC.2019.8715554

Citation Details

Zap Q-Learning - A User's Guide

The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellman's equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space more »

Award ID(s):: 1646229

PAR ID:: 10211835

Author(s) / Creator(s):: Devraj, Adithya M.; Busic, Ana; Meyn, Sean

Date Published:: 2019-01-01

Journal Name:: Proc. of the Fifth Indian Control Conference

Page Range / eLocation ID:: 10 to 15

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/INDIANCC.2019.8715554

More Like this