Model-Free Robust Average-Reward Reinforcement Learning

Wang, Yue; Velasquez, Alvaro; Atia, George K; Prater-Bennette, Ashley; Zou, Shaofeng

Citation Details

Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs. In this paper, we focus on the robust average-reward MDPs under the modelfree setting. We first theoretically characterize the structure of solutions to the robust averagereward Bellman equation, which is essential for our later convergence analysis. We then design two model-free algorithms, robust relative value iteration (RVI) TD and robust RVI Q-learning, and theoretically prove their convergence to the optimal solution. We provide several widely used uncertainty sets as examples, including those def ined by the contamination model, total variation, Chi-squared divergence, Kullback-Leibler (KL) divergence and Wasserstein distance. more »

Award ID(s):: 2229873

PAR ID:: 10540763

Author(s) / Creator(s):: Wang, Yue; Velasquez, Alvaro; Atia, George K; Prater-Bennette, Ashley; Zou, Shaofeng

Editor(s):: Krause, Andreas; Brunskill, Emma; Cho, Kyunghyun; Engelhardt; Barbara; Sabato, Sivan; Scarlett, Jonathan

Publisher / Repository:: Proceedings of Machine Learning Research

Date Published:: 2023-07-23

Volume:: 202

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this