Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes

HasanzadeZonuzy, Aria; Kalathil, Dileep; Shakkottai, Srinivas

Citation Details

In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction. more »

Award ID(s):: 2045783

PAR ID:: 10327537

Author(s) / Creator(s):: HasanzadeZonuzy, Aria; Kalathil, Dileep; Shakkottai, Srinivas

Date Published:: 2021-08-01

Journal Name:: International Joint Conference on Artificial Intelligence (IJCAI)

Page Range / eLocation ID:: 2519--2525

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this