Q-learning with nearest neighbors

Shah, Devavrat; Xie, Qiaomin

Citation Details

We consider model-free reinforcement learning for infinite-horizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor Q-Learning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a d-dimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "-accurate estimate of the optimal Q-function nearly optimal sample complexity. more »

Award ID(s):: 1523546 1740751 1462158

PAR ID:: 10078952

Author(s) / Creator(s):: Shah, Devavrat; Xie, Qiaomin

Date Published:: 2018-10-01

Journal Name:: Nips

ISSN:: 1365-8875

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this