In recent years, researchers have made significant progress in devising reinforcementlearning algorithms for optimizing linear temporal logic (LTL) objectives and LTLlike objectives.Despite these advancements, there are fundamental limitations to how well this problem can be solved. Previous studies have alluded to this fact but have not examined it in depth.In this paper, we address the tractability of reinforcement learning for general LTL objectives from a theoretical perspective.We formalize the problem under the probably approximately correct learning in Markov decision processes (PACMDP) framework, a standard framework for measuring sample complexity in reinforcement learning.In this formalization, we prove that the optimal policy for any LTL formula is PACMDPlearnable if and only if the formula is in the most limited class in the LTL hierarchy, consisting of formulas that are decidable within a finite horizon.Practically, our result implies that it is impossible for a reinforcementlearning algorithm to obtain a PACMDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for LTL objectives that are not decidable within a finite horizon.
Computably Continuous ReinforcementLearning Objectives are PAClearnable
In reinforcement learning, the classic objectives of maximizing discounted
and finitehorizon cumulative rewards are PAClearnable: There are algorithms
that learn a nearoptimal policy with high probability using a finite amount of
samples and computation. In recent years, researchers have introduced
objectives and corresponding reinforcementlearning algorithms beyond the
classic cumulative rewards, such as objectives specified as linear temporal
logic formulas. However, questions about the PAClearnability of these new
objectives have remained open.
This work demonstrates the PAClearnability of general reinforcementlearning
objectives through sufficient conditions for PAClearnability in two analysis
settings. In particular, for the analysis that considers only sample
complexity, we prove that if an objective given as an oracle is uniformly
continuous, then it is PAClearnable. Further, for the analysis that considers
computational complexity, we prove that if an objective is computable, then it
is PAClearnable. In other words, if a procedure computes successive
approximations of the objective's value, then the objective is PAClearnable.
We give three applications of our condition on objectives from the literature
with previously unknown PAClearnability and prove that these objectives are
PAClearnable. Overall, our result helps verify existing objectives'
PAClearnability. Also, as some studied objectives that are not uniformly
continuous have been shown to be not PAClearnable, our results could guide the
design of new PAClearnable objectives.
more »
« less
 Award ID(s):
 1918839
 NSFPAR ID:
 10404354
 Date Published:
 Journal Name:
 arXivorg
 ISSN:
 23318422
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Etessami, Kousha ; Feige, Uriel ; Puppis, Gabriele (Ed.)In a recent article, Alon, Hanneke, Holzman, and Moran (FOCS '21) introduced a unifying framework to study the learnability of classes of partial concepts. One of the central questions studied in their work is whether the learnability of a partial concept class is always inherited from the learnability of some "extension" of it to a total concept class. They showed this is not the case for PAC learning but left the problem open for the stronger notion of online learnability. We resolve this problem by constructing a class of partial concepts that is online learnable, but no extension of it to a class of total concepts is online learnable (or even PAC learnable).more » « less

Abstract About 25 years ago, it came to light that a single combinatorial property determines both an important dividing line in model theory (NIP) and machine learning (PAClearnability). The following years saw a fruitful exchange of ideas between PAClearning and the model theory of NIP structures. In this article, we point out a new and similar connection between model theory and machine learning, this time developing a correspondence between stability and learnability in various settings of online learning. In particular, this gives many new examples of mathematically interesting classes which are learnable in the online setting.more » « less

null (Ed.)A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sampleefficient pureprivate learners can be timeefficiently compiled into online learners. We show that, assuming the existence of oneway functions, such an efficient conversion is impossible even for general pureprivate learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).more » « less