NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Solving multi-model MDPs by coordinate ascent and dynamic programming

Xihong Su, Marek Petrik (January 2023, Conference on Uncertainty in Artificial Intelligence)

Multi-model Markov decision process (MMDP) is a promising framework for computing policies that are robust to parameter uncertainty in MDPs. MMDPs aim to find a policy that maximizes the expected return over a distribution of MDP models. Because MMDPs are NP-hard to solve, most methods resort to approximations. In this paper, we derive the policy gradient of MMDPs and propose CADP, which combines a coordinate ascent method and a dynamic programming algorithm for solving MMDPs. The main innovation of CADP compared with earlier algorithms is to take the coordinate ascent perspective to adjust model weights iteratively to guarantee monotone policy improvements to a local maximum. A theoretical analysis of CADP proves that it never performs worse than previous dynamic programming algorithms like WSU. Our numerical results indicate that CADP substantially outperforms existing methods on several benchmark problems.
more » « less
Full Text Available
Entropic Risk Optimization in Discounted MDPs

Hau, Jia Lin; Petrik, Marek; Ghavamzadeh, Mohammad (January 2023, International Conference on Artificial Intelligence and Statistics)

Full Text Available
Policy Gradient in Robust MDPs with Global Convergence Guarantee

Qiuhao Wang, Chin Pang (January 2023, International Conference on Machine Learning)

Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these methods to RMDPs has been challenging. As a result, the applicability of RMDPs to large, practical domains remains limited. This paper proposes a new Double-Loop Robust Policy Gradient (DRPG), the first generic policy gradient method for RMDPs. In contrast with prior robust policy gradient algorithms, DRPG monotonically reduces approximation errors to guarantee convergence to a globally optimal policy in tabular RMDPs. We introduce a novel parametric transition kernel and solve the inner loop robust policy via a gradient-based method. Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.
more » « less
Full Text Available
Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

Elita Lobo, Harvineet Singh (January 2022, Uncertainty in artificial intelligence)

Full Text Available
Optimizing Percentile Criterion Using Robust MDPs

Bahram Behzadian, Reazul Hasan (January 2021, Proceedings of Machine Learning Research)

Full Text Available
Fast Algorithms for L∞-Constrained S-Rectangular Robust MDPs

Bahram Behzadian, Marek Petrik (January 2021, Advances in neural information processing systems)

Robust Markov decision processes (RMDPs) are a useful building block of robust reinforcement learning algorithms but can be hard to solve. This paper proposes a fast, exact algorithm for computing the Bellman operator for S-rectangular robust Markov decision processes with L∞-constrained rectangular ambiguity sets. The algorithm combines a novel homotopy continuation method with a bisection method to solve S-rectangular ambiguity in quasi-linear time in the number of states and actions. The algorithm improves on the cubic time required by leading general linear programming methods. Our experimental results confirm the practical viability of our method and show that it outperforms a leading commercial optimization package by several orders of magnitude.
more » « less
Full Text Available
Policy Gradient Bayesian Robust Optimization for Imitation Learning

Zaynah Javed, Daniel S. (January 2021, International Conference on Machine Learning)

Full Text Available
Robust Behavior Cloning with Adversarial Demonstration Detection

Mostafa Hussein, Brendan Crowe (January 2021, Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems)

Full Text Available
Partial Policy Iteration for L1-Robust Markov Decision Processes

Chin Pang Ho, Marek Petrik (January 2021, Journal of machine learning research)

Robust Markov decision processes (MDPs) compute reliable solutions for dynamic decision problems with partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which limits their scalability. This paper describes new, efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted L1 norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the ordinary Bellman operator's linear complexity. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach, which uses linear programming solvers combined with a robust value iteration.
more » « less
Full Text Available
Fast Algorithms for L-infinity constrained S-rectangular Robust MDPs

Bahram Behzadian, Marek Petrik (January 2021, Neural Information Processing Systems)

Full Text Available

« Prev Next »

Search for: All records