NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

When is Agnostic Reinforcement Learning Statistically Tractable?

Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nathan (March 2025, Advances in Neural Information Processing Systems 36)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
We study the problem of agnostic PAC reinforcement learning (RL): given a policy class Pi, how many rounds of interaction with an unknown MDP (with a potentially large state and action space) are required to learn an epsilon-suboptimal policy with respect to Pi? Towards that end, we introduce a new complexity measure, called the spanning capacity, that depends solely on the set Pi and is independent of the MDP dynamics. With a generative model, we show that the spanning capacity characterizes PAC learnability for every policy class Pi. However, for online RL, the situation is more subtle. We show there exists a policy class Pi with a bounded spanning capacity that requires a superpolynomial number of samples to learn. This reveals a surprising separation for agnostic learnability between generative access and online access models (as well as between deterministic/stochastic MDPs under online access). On the positive side, we identify an additional sunflower structure which in conjunction with bounded spanning capacity enables statistically efficient online RL via a new algorithm called POPLER, which takes inspiration from classical importance sampling methods as well as recent developments for reachable-state identification and policy evaluation in reward-free exploration.
more » « less
Free, publicly-accessible full text available March 30, 2026
Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

Chen, Fan; Foster, Dylan; Han, Yanjun; Qian, Jian; Rakhlin, Alexander; Xu, Yunbei (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
On the Performance of Empirical Risk Minimization with Smoothed Data

Block, Adam; Rakhlin, Alexander; Shetty, Abhishek (July 2024, Conference on Learning Theory)

Full Text Available
The Non-linear $ F $-Design and Applications to Interactive Learning

Agarwal, Alekh; Qian, Jian; Rakhlin, Alexander; Zhang, Tong (April 2024, Forty-first International Conference on Machine Learning)

Full Text Available
Convergence of Adam under relaxed assumptions

Li, Haochuan; Rakhlin, Alexander; Jadbabaie, Ali (December 2023, Advances in Neural Information Processing Systems)

Full Text Available
When is Agnostic Reinforcement Learning Statistically Tractable?

Jia, Zeyu; Li, Gene; Rakhlin, Alexander; Sekhari, Ayush; Srebro, Nati (January 2024, Advances in Neural Information Processing Systems 36)

Full Text Available
Convex and non-convex optimization under generalized smoothness

Li, Haochuan; Qian, Jian; Tian, Yi; Rakhlin, Alexander; Jadbabaie, Ali (December 2023, Advances in Neural Information Processing Systems)

Full Text Available
Model-free reinforcement learning with the decision-estimation coefficient

Foster, Dylan J; Golowich, Noah; Qian, Jian; Rakhlin, Alexander; Sekhari, Ayush (December 2023, Advances in Neural Information Processing Systems)

Full Text Available
Oracle-efficient smoothed online learning for piecewise continuous decision making

Block, Adam; Simchowitz, Max; Rakhlin, Alexander (July 2023, Conference on Learning Theory)

Full Text Available
Deep learning: a statistical viewpoint

Bartlett, Peter L.; Montanari, Andrea; Rakhlin, Alexander (July 2021, Acta numerica)

Full Text Available

« Prev Next »

Search for: All records