A PAC Learning Algorithm for LTL and Omega-Regular Objectives in MDPs

Perez, M; Somenzi, F; Trivedi, A

Citation Details

Linear temporal logic (LTL) and ω-regular objectives—a su- perset of LTL—have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We in- troduce a model-based probably approximately correct (PAC) learning algorithm for ω-regular objectives in Markov deci- sion processes (MDPs). As part of the development of our algorithm, we introduce the ε-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the ω-regular objective in the limit. We prove that our algo- rithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory. more »

Award ID(s):: 2146563 2009022

PAR ID:: 10526017

Author(s) / Creator(s):: Perez, M; Somenzi, F; Trivedi, A

Publisher / Repository:: The Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)

Date Published:: 2024-02-20

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this