Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

Agarwal, Alekh; Kakade, Sham; Yang, Lin F.

Citation Details

This work considers the sample and computational complexity of obtaining an $$\epsilon$$-optimal policy in a discounted Markov Decision Process (MDP), given only access to a generative model. In this model, the learner accesses the underlying transition model via a sampling oracle that provides a sample of the next state, when given any state-action pair as input. We are interested in a basic and unresolved question in model based planning: is this naïve “plug-in” approach — where we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP — non-asymptotically, minimax optimal? Our main result answers this question positively. With regards to computation, our result provides a simpler approach towards minimax optimal planning: in comparison to prior model-free results, we show that using \emph{any} high accuracy, black-box planning oracle in the empirical model suffices to obtain the minimax error rate. The key proof technique uses a leave-one-out analysis, in a novel “absorbing MDP” construction, to decouple the statistical dependency issues that arise in the analysis of model-based planning; this construction may be helpful more generally. more »

Award ID(s):: 1703574

PAR ID:: 10177060

Author(s) / Creator(s):: Agarwal, Alekh; Kakade, Sham; Yang, Lin F.

Date Published:: 2020-07-09

Journal Name:: Proceedings of Machine Learning Research

Volume:: 125

ISSN:: 2640-3498

Page Range / eLocation ID:: 67-83

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this