NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Softmax policy gradient methods can take exponential time to converge

https://doi.org/10.1007/s10107-022-01920-6

Li, Gen; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (January 2023, Mathematical Programming)

Abstract The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$ $γ$ -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$ $S$ and the effective horizon$$\frac{1}{1-\gamma }$$ $\frac{1}{1 - γ}$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ $η$ can take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ $\begin{matrix} \frac{1}{η} {| S |}^{2^{Ω (\frac{1}{1 - γ})}} iterations \end{matrix}$ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.
more » « less
Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

https://doi.org/10.1109/TIT.2024.3514795

Li, Gen; Cai, Changxiao; Poor, H Vincent; Chen, Yuxin (February 2025, IEEE Transactions on Information Theory)

Free, publicly-accessible full text available February 1, 2026
Deflated HeteroPCA: Overcoming the curse of ill-conditioning in heteroskedastic PCA

https://doi.org/10.1214/24-AOS2456

Zhou, Yuchen; Chen, Yuxin (February 2025, The Annals of Statistics)

Free, publicly-accessible full text available February 1, 2026
Inference for heteroskedastic PCA with missing data

https://doi.org/10.1214/24-AOS2366

Yan, Yuling; Chen, Yuxin; Fan, Jianqing (April 2024, The Annals of Statistics)

Full Text Available
Settling the sample complexity of model-based offline reinforcement learning

https://doi.org/10.1214/23-AOS2342

Li, Gen; Shi, Laixi; Chen, Yuxin; Chi, Yuejie; Wei, Yuting (February 2024, The Annals of Statistics)

Full Text Available
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

https://doi.org/10.1287/opre.2023.2450

Li, Gen; Cai, Changxiao; Chen, Yuxin; Wei, Yuting; Chi, Yuejie (January 2024, Operations Research)

This paper investigates a model-free algorithm of broad interest in reinforcement learning, namely, Q-learning. Whereas substantial progress had been made toward understanding the sample efficiency of Q-learning in recent years, it remained largely unclear whether Q-learning is sample-optimal and how to sharpen the sample complexity analysis of Q-learning. In this paper, we settle these questions: (1) When there is only a single action, we show that Q-learning (or, equivalently, TD learning) is provably minimax optimal. (2) When there are at least two actions, our theory unveils the strict suboptimality of Q-learning and rigorizes the negative impact of overestimation in Q-learning. Our theory accommodates both the synchronous case (i.e., the case in which independent samples are drawn) and the asynchronous case (i.e., the case in which one only has access to a single Markovian trajectory).
more » « less
Full Text Available
The Efficacy of Pessimism in Asynchronous Q-Learning

https://doi.org/10.1109/TIT.2023.3299840

Yan, Yuling; Li, Gen; Chen, Yuxin; Fan, Jianqing (November 2023, IEEE Transactions on Information Theory)
Nonconvex Low-Rank Tensor Completion from Noisy Data

https://doi.org/10.1287/opre.2021.2106

Cai, Changxiao; Li, Gen; Poor, H. Vincent; Chen, Yuxin (March 2022, Operations Research)

We study a noisy tensor completion problem of broad practical interest, namely, the reconstruction of a low-rank tensor from highly incomplete and randomly corrupted observations of its entries. Whereas a variety of prior work has been dedicated to this problem, prior algorithms either are computationally too expensive for large-scale applications or come with suboptimal statistical guarantees. Focusing on “incoherent” and well-conditioned tensors of a constant canonical polyadic rank, we propose a two-stage nonconvex algorithm—(vanilla) gradient descent following a rough initialization—that achieves the best of both worlds. Specifically, the proposed nonconvex algorithm faithfully completes the tensor and retrieves all individual tensor factors within nearly linear time, while at the same time enjoying near-optimal statistical guarantees (i.e., minimal sample complexity and optimal estimation accuracy). The estimation errors are evenly spread out across all entries, thus achieving optimal [Formula: see text] statistical accuracy. We also discuss how to extend our approach to accommodate asymmetric tensors. The insight conveyed through our analysis of nonconvex optimization might have implications for other tensor estimation problems.
more » « less
Full Text Available
Tackling Small Eigen-Gaps: Fine-Grained Eigenvector Estimation and Inference Under Heteroscedastic Noise

https://doi.org/10.1109/TIT.2021.3111828

Cheng, Chen; Wei, Yuting; Chen, Yuxin (November 2021, IEEE Transactions on Information Theory)
null (Ed.)
Full Text Available
Bridging convex and nonconvex optimization in robust PCA: Noise, outliers and missing data

https://doi.org/10.1214/21-AOS2066

Chen, Yuxin; Fan, Jianqing; Ma, Cong; Yan, Yuling (October 2021, The Annals of Statistics)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records