NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

https://doi.org/10.1287/opre.2023.2451

Li, Gen; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (January 2024, Operations Research)

This paper studies a central issue in modern reinforcement learning, the sample efficiency, and makes progress toward solving an idealistic scenario that assumes access to a generative model or a simulator. Despite a large number of prior works tackling this problem, a complete picture of the trade-offs between sample complexity and statistical accuracy has yet to be determined. In particular, all prior results suffer from a severe sample size barrier in the sense that their claimed statistical guarantees hold only when the sample size exceeds some enormous threshold. The current paper overcomes this barrier and fully settles this problem; more specifically, we establish the minimax optimality of the model-based approach for any given target accuracy level. To the best of our knowledge, this work delivers the first minimax-optimal guarantees that accommodate the entire range of sample sizes (beyond which finding a meaningful policy is information theoretically infeasible).
more » « less
Full Text Available
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

https://doi.org/10.1287/opre.2023.2450

Li, Gen; Cai, Changxiao; Chen, Yuxin; Wei, Yuting; Chi, Yuejie (January 2024, Operations Research)

This paper investigates a model-free algorithm of broad interest in reinforcement learning, namely, Q-learning. Whereas substantial progress had been made toward understanding the sample efficiency of Q-learning in recent years, it remained largely unclear whether Q-learning is sample-optimal and how to sharpen the sample complexity analysis of Q-learning. In this paper, we settle these questions: (1) When there is only a single action, we show that Q-learning (or, equivalently, TD learning) is provably minimax optimal. (2) When there are at least two actions, our theory unveils the strict suboptimality of Q-learning and rigorizes the negative impact of overestimation in Q-learning. Our theory accommodates both the synchronous case (i.e., the case in which independent samples are drawn) and the asynchronous case (i.e., the case in which one only has access to a single Markovian trajectory).
more » « less
Full Text Available
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing

Xu, Xingyu; Shen, Yandi; Chi, Yuejie; Ma, Cong (July 2023, Proceedings of the 40th International Conference on Machine Learning)

Full Text Available
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

https://doi.org/10.1109/TIT.2021.3120096

Li, Gen; Wei, Yuting; Chi, Yuejie; Gu, Yuantao; Chen, Yuxin (January 2022, IEEE Transactions on Information Theory)

Full Text Available
Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

https://doi.org/10.1287/opre.2021.2151

Cen, Shicong; Cheng, Chen; Chen, Yuxin; Wei, Yuting; Chi, Yuejie (January 2022, Operations Research)

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization—an algorithmic scheme that encourages exploration—and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly—even quadratically, once it enters a local region around the optimal policy—when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates and shed light upon the role of entropy regularization in enabling fast convergence.
more » « less
Full Text Available
Scaling and Scalability: Provable Nonconvex Low-Rank Tensor Estimation from Incomplete Measurements

Tong, Tian; Ma, Cong; Prater-Bennette, Ashley; Tripp, Erin; Chi, Yuejie (January 2022, Journal of machine learning research)

Full Text Available
Manifold Gradient Descent Solves Multi-Channel Sparse Blind Deconvolution Provably and Efficiently

https://doi.org/10.1109/TIT.2021.3075148

Shi, Laixi; Chi, Yuejie (July 2021, IEEE Transactions on Information Theory)
null (Ed.)
Full Text Available
Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees

https://doi.org/10.1214/20-AOS1986

Cai, Changxiao; Li, Gen; Chi, Yuejie; Poor, H. Vincent; Chen, Yuxin (April 2021, The Annals of Statistics)
null (Ed.)
Full Text Available
Nonconvex Matrix Factorization From Rank-One Measurements

https://doi.org/10.1109/TIT.2021.3050427

Li, Yuanxin; Ma, Cong; Chen, Yuxin; Chi, Yuejie (March 2021, IEEE Transactions on Information Theory)
null (Ed.)
Full Text Available
Accelerating Ill-Conditioned Low-Rank Matrix Estimation via Scaled Gradient Descent

Tong, Tian; Ma, Cong; Chi, Yuejie (January 2021, Journal of machine learning research)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records