NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Model-Free Robust φ-Divergence Reinforcement Learning Using Both Offline and Online Data

Panaganti, Kishan; Wierman, Adam; Mazumdar, Eric (July 2024, Proceedings of the 41st International Conference on Machine Learning)

The robust 𝜙-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust 𝜙-regularized fitted Q-iteration for learning an 𝜖-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of 𝜙-divergences achieving robust optimal policies in high-dimensional systems of arbitrary large state space with general function approximation. Second, we introduce the hybrid robust 𝜙-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-free algorithm called Hybrid robust Total-variation-regularized Q-iteration. To the best of our knowledge, we provide the first improved out-of-data-distribution assumption in large-scale problems of arbitrary large state space with general function approximation under the hybrid robust 𝜙-regularized reinforcement learning framework.
more » « less
Full Text Available
Distributionally Robust Behavioral Cloning for Robust Imitation Learning

https://doi.org/10.1109/CDC49753.2023.10383976

Panaganti, Kishan; Xu, Zaiyan; Kalathil, Dileep; Ghavamzadeh, Mohammad (December 2023, IEEE)
Robust Reinforcement Learning using Offline Data

Panaganti, Kishan; Xu, Zaiyan; Kalathil, Dileep; Ghavamzadeh, Mohammad (December 2022, Advances in neural information processing systems)

Full Text Available
Sample Complexity of Robust Reinforcement Learning with a Generative Model

Panaganti, Kishan; Kalathil, Dileep (March 2022, International Conference on Artificial Intelligence and Statistics (AISTATS))

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mis- matches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an ε-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.
more » « less
Full Text Available
Bounded Regret for Finitely Parameterized Multi-Armed Bandits

https://doi.org/10.1109/LCSYS.2020.3008798

Panaganti, Kishan; Kalathil, Dileep (July 2021, IEEE Control Systems Letters)
null (Ed.)
Full Text Available
Robust Reinforcement Learning using Least Squares Policy Iteration with Provable Performance Guarantees

Panaganti, Kishan; Kalathil, Dileep (July 2021, International Conference on Machine Learning (ICML))

This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Ro- bust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some standard bench- mark problems.
more » « less
Full Text Available

Search for: All records