Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Wang, Qiuhao; Zha, Yuqi; Ho, Chin_Pang; Petrik, Marek

Citation Details

This content will become publicly available on July 18, 2026

Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Robust Markov Decision Processes (MDPs) offer a promising framework for computing reliable policies under model uncertainty. While policy gradient methods have gained increasing popularity in robust discounted MDPs, their application to the average-reward criterion remains largely unexplored. This paper proposes a Robust Projected Policy Gradient (RP2G), the first generic policy gradient method for robust average-reward MDPs (RAMDPs) that is applicable beyond the typical rectangularity assumption on transition ambiguity. In contrast to existing robust policy gradient algorithms, RP2G incorporates an adaptive decreasing tolerance mechanism for efficient policy updates at each iteration. We also present a comprehensive convergence analysis of RP2G for solving ergodic tabular RAMDPs. Furthermore, we establish the first study of the inner worst-case transition evaluation problem in RAMDPs, proposing two gradient-based algorithms tailored for rectangular and general ambiguity sets, each with provable convergence guarantees. Numerical experiments confirm the global convergence of our new algorithm and demonstrate its superior performance. more »

Award ID(s):: 2144601

PAR ID:: 10620827

Author(s) / Creator(s):: Wang, Qiuhao; Zha, Yuqi; Ho, Chin_Pang; Petrik, Marek

Publisher / Repository:: International Conference on Machine Learning

Date Published:: 2025-07-18

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 18, 2026
Conference Paper:
The DOI is not currently available.

More Like this