NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Softmax policy gradient methods can take exponential time to converge

https://doi.org/10.1007/s10107-022-01920-6

Li, Gen; Wei, Yuting; Chi, Yuejie; Chen, Yuxin (January 2023, Mathematical Programming)

Abstract The softmax policy gradient (PG) method, which performs gradient ascent under softmax policy parameterization, is arguably one of the de facto implementations of policy optimization in modern reinforcement learning. For$$\gamma $$ $γ$ -discounted infinite-horizon tabular Markov decision processes (MDPs), remarkable progress has recently been achieved towards establishing global convergence of softmax PG methods in finding a near-optimal policy. However, prior results fall short of delineating clear dependencies of convergence rates on salient parameters such as the cardinality of the state space$${\mathcal {S}}$$ $S$ and the effective horizon$$\frac{1}{1-\gamma }$$ $\frac{1}{1 - γ}$ , both of which could be excessively large. In this paper, we deliver a pessimistic message regarding the iteration complexity of softmax PG methods, despite assuming access to exact gradient computation. Specifically, we demonstrate that the softmax PG method with stepsize$$\eta $$ $η$ can take$$\begin{aligned} \frac{1}{\eta } |{\mathcal {S}}|^{2^{\Omega \big (\frac{1}{1-\gamma }\big )}} ~\text {iterations} \end{aligned}$$ $\begin{matrix} \frac{1}{η} {| S |}^{2^{Ω (\frac{1}{1 - γ})}} iterations \end{matrix}$ to converge, even in the presence of a benign policy initialization and an initial state distribution amenable to exploration (so that the distribution mismatch coefficient is not exceedingly large). This is accomplished by characterizing the algorithmic dynamics over a carefully-constructed MDP containing only three actions. Our exponential lower bound hints at the necessity of carefully adjusting update rules or enforcing proper regularization in accelerating PG methods.
more » « less
Robust Gymnasium: A Unified Modular Benchmark for Robust Reinforcement Learning

Gu, Shangding; Shi, Laixi; Wen, Muning; Jin, Ming; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam; Spanos, Costas (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Cen, S; Mei, J; Goshvadi, K; Dai, H; Yang, T; Yang, S; Schuurmans, D; Chi, Y; Dai, B (April 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available April 24, 2026
The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond

Woo, Jiin; Joshi, Gauri; Chi, Yuejie (January 2025, Journal of machine learning research)

Free, publicly-accessible full text available January 1, 2026
Hybrid reinforcement learning breaks sample size barriers in linear MDPs

Tan, Kevin; Fan, Wei; Wei, Yuting (December 2024, Neural Information Processing Systems)

Free, publicly-accessible full text available December 31, 2025
The Sample-Communication Complexity Trade-off in Federated Q-Learning

Salgia, Sudeep; Chi, Yuejie (December 2024, 38th Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available December 10, 2025
Federated Natural Policy Gradient and Actor Critic Methods for Multi-task Reinforcement Learning

Yang, Tong; Cen, Shicong; Wei, Yuting; Chen, Yuxin; Chi, Yuejie (December 2024, 38th Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available December 10, 2025
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Woo, Jiin; Shi, Laixi; Joshi, Gauri; Chi, Yuejie (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Shi, Laixi; Mazumdar, Eric; Chi, Yuejie; Wierman, Adam (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available
Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

Woo, Jiin; Shi, Laixi; Joshi, Gauri; Chi, Yuejie (July 2024, Proceedings of the 41st International Conference on Machine Learning)

Full Text Available

« Prev Next »

Search for: All records