An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

Sharma, Hiteshi; Jain, Rahul; Gupta, Abhishek

doi:10.23919/ECC.2019.8795982

Citation Details

An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance. more »

Award ID(s):: 1810447 1817212

PAR ID:: 10128112

Author(s) / Creator(s):: Sharma, Hiteshi; Jain, Rahul; Gupta, Abhishek

Date Published:: 2019-06-01

Journal Name:: 2019 18th European Control Conference (ECC)

Page Range / eLocation ID:: 1368 to 1373

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.23919/ECC.2019.8795982

More Like this