Sample Complexity of Robust Reinforcement Learning with a Generative Model

Panaganti, Kishan; Kalathil, Dileep

Citation Details

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mis- matches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an ε-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems. more »

Award ID(s):: 2045783 1850206

NSF-PAR ID:: 10327542

Author(s) / Creator(s):: Panaganti, Kishan; Kalathil, Dileep

Date Published:: 2022-03-01

Journal Name:: International Conference on Artificial Intelligence and Statistics (AISTATS)

Page Range / eLocation ID:: 9582--9602

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this