Robust Average-Reward Reinforcement Learning

Wang, Yue; Velasquez, Alvaro; Atia, George; Prater-Bennette, Ashley; Zou, Shaofeng

doi:10.1613/jair.1.15451

Citation Details

Robust Average-Reward Reinforcement Learning

Robust Markov decision processes (MDPs) aim to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. Existing studies mostly have focused on the robust MDPs under the discounted reward criterion, leaving the ones under the average-reward criterion largely unexplored. In this paper, we develop the first comprehensive and systematic study of robust average-reward MDPs, where the goal is to optimize the long-term average performance under the worst case. Our contributions are four-folds: (1) we prove the uniform convergence of the robust discounted value function to the robust average-reward function as the discount factor γ goes to 1; (2) we derive the robust average-reward Bellman equation, characterize the structure of its solution set, and prove the equivalence between solving the robust Bellman equation and finding the optimal robust policy; (3) we design robust dynamic programming algorithms, and theoretically characterize their convergence to the optimal policy; and (4) we design two model-free algorithms unitizing the multi-level Monte-Carlo approach, and prove their asymptotic convergence more »

Award ID(s):: 2229873 2106339

PAR ID:: 10573457

Author(s) / Creator(s):: Wang, Yue; Velasquez, Alvaro; Atia, George; Prater-Bennette, Ashley; Zou, Shaofeng

Publisher / Repository:: Association for the Advancement of Artificial Intelligence

Date Published:: 2024-05-10

Journal Name:: Journal of Artificial Intelligence Research

Volume:: 80

ISSN:: 1076-9757

Page Range / eLocation ID:: 719 to 803

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1613/jair.1.15451

More Like this