When is Agnostic Reinforcement Learning Statistically Tractable?

Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nathan

Citation Details

This content will become publicly available on March 30, 2026

When is Agnostic Reinforcement Learning Statistically Tractable?

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an epsilon-suboptimal policy with respect to Pi? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Pi and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class Pi. However, for online RL, the situation is more subtle. We show there exists a policy class Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration. more »

Award ID(s):: 1934843

PAR ID:: 10563091

Author(s) / Creator(s):: Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nathan

Editor(s):: Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S

Publisher / Repository:: Advances in Neural Information Processing Systems 36

Date Published:: 2025-03-30

Volume:: 36

ISBN:: 9781713899921

Page Range / eLocation ID:: 27820 - 27879

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on March 30, 2026
Conference Paper:
The DOI is not currently available.

More Like this