An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions

Sharma, Hiteshi; Jain, Rahul

doi:10.1109/ALLERTON.2019.8919719

Citation Details

An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions

It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed. more »

Award ID(s):: 1817212

PAR ID:: 10128111

Author(s) / Creator(s):: Sharma, Hiteshi; Jain, Rahul

Date Published:: 2019-09-27

Journal Name:: 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

Page Range / eLocation ID:: 734 to 740

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ALLERTON.2019.8919719

More Like this