On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Karabag, Mustafa O; Topcu, Ufuk

doi:10.1609/aaai.v37i7.25989

Citation Details

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning. more »

Award ID(s):: 1652113

PAR ID:: 10508416

Author(s) / Creator(s):: Karabag, Mustafa O; Topcu, Ufuk

Publisher / Repository:: AAAI Conference on Artificial Intelligence

Date Published:: 2023-06-27

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 37

Issue:: 7

ISSN:: 2159-5399

Page Range / eLocation ID:: 8195 to 8202

Subject(s) / Keyword(s):: ML: Reinforcement Learning Theory, ML: Reinforcement Learning Algorithms, PRS: Planning Under Uncertainty, PRS: Planning With Markov Models (MDPs, POMDPs)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v37i7.25989

More Like this