NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Settling the Sample Complexity of Online Reinforcement Learning

https://doi.org/10.1145/3733592

Zhang, Zihan; Chen, Yuxin; Lee, Jason; Du, Simon S (May 2025, Journal of the ACM)

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the optimality of these results is only guaranteed in a “large-sample” regime, imposing enormous burn-in cost in order for their algorithms to operate optimally. How to achieve minimax-optimal regret without incurring any burn-in cost has been an open problem in RL theory. We settle this problem for finite-horizon inhomogeneous Markov decision processes. Specifically, we prove that a modified version ofMVP(Monotonic Value Propagation), an optimistic model-based algorithm proposed by Zhang et al. [82], achieves a regret on the order of (modulo log factors)\begin{equation*} \min \big \lbrace \sqrt {SAH^3K}, \,HK \big \rbrace,\end{equation*}whereSis the number of states,Ais the number of actions,His the horizon length, andKis the total number of episodes. This regret matches the minimax lower bound for the entire range of sample sizeK≥ 1, essentially eliminating any burn-in requirement. It also translates to a PAC sample complexity (i.e., the number of episodes needed to yield ε-accuracy) of\(\frac{SAH^3}{\varepsilon ^2} \)up to log factor, which is minimax-optimal for the full ε-range. Further, we extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances. The key technical innovation lies in a novel analysis paradigm (based on a new concept called “profiles”) to decouple complicated statistical dependency across the sample trajectories — a long-standing challenge facing the analysis of online RL in the sample-starved regime.
more » « less
Free, publicly-accessible full text available May 2, 2026
Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

https://doi.org/10.1214/24-AOS2456

Zhou, Yuchen; Chen, Yuxin (February 2025, The Annals of Statistics)

Free, publicly-accessible full text available February 1, 2026
Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

https://doi.org/10.1109/TIT.2024.3514795

Li, Gen; Cai, Changxiao; Poor, H Vincent; Chen, Yuxin (February 2025, IEEE Transactions on Information Theory)

Free, publicly-accessible full text available February 1, 2026
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Yang, Tong; Cen, Shicong; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (December 2024, 38th Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available December 10, 2025
Model-Based Reinforcement Learning for Offline Zero-Sum Markov Games

https://doi.org/10.1287/opre.2022.0342

Yan, Yuling; Li, Gen; Chen, Yuxin; Fan, Jianqing (November 2024, Operations Research)

This paper makes progress toward learning Nash equilibria in two-player, zero-sum Markov games from offline data. Despite a large number of prior works tackling this problem, the state-of-the-art results suffer from the curse of multiple agents in the sense that their sample complexity bounds scale linearly with the total number of joint actions. The current paper proposes a new model-based algorithm, which provably finds an approximate Nash equilibrium with a sample complexity that scales linearly with the total number of individual actions. This work also develops a matching minimax lower bound, demonstrating the minimax optimality of the proposed algorithm for a broad regime of interest. An appealing feature of the result lies in algorithmic simplicity, which reveals the unnecessity of sophisticated variance reduction and sample splitting in achieving sample optimality.
more » « less
Free, publicly-accessible full text available November 1, 2025
Settling the Sample Complexity of Online Reinforcement Learning

Zhang, Zihan; Chen, Yuxin; Lee, Jason; Du, Simon (July 2024, Conference on Learning Theory)
Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Li, Gen; Yan, Yuling; Chen, Yuxin; Fan, Jianqing (July 2024, Conference on Learning Theory)
Optimal Multi-Distribution Learning

Zhang, Zihan; Zhan, Wenhao; Chen, Yuxin; Du, Simon; Lee, Jason (July 2024, Conference on Learning Theory)
Towards Non-Asymptotic Convergence for Diffusion-Based Generative Models

Li, Gen; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (May 2024, International Conference on Learning Representations)
Accelerating convergence of score-based diffusion models, provably

Li, Gen; Huang, Yu; Efimov, Timofey; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (July 2024, International Conference on Machine Learning)

« Prev Next »

Search for: All records