skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Computably Continuous Reinforcement-Learning Objectives are PAC-learnable
In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.  more » « less
Award ID(s):
1918839
PAR ID:
10404354
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
arXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Let H be a binary-labeled concept class. We prove that H can be PAC learned by an (approximate) differentially private algorithm if and only if it has a finite Littlestone dimension. This implies a qualitative equivalence between online learnability and private PAC learnability. 
    more » « less
  2. Etessami, Kousha; Feige, Uriel; Puppis, Gabriele (Ed.)
    In a recent article, Alon, Hanneke, Holzman, and Moran (FOCS '21) introduced a unifying framework to study the learnability of classes of partial concepts. One of the central questions studied in their work is whether the learnability of a partial concept class is always inherited from the learnability of some "extension" of it to a total concept class. They showed this is not the case for PAC learning but left the problem open for the stronger notion of online learnability. We resolve this problem by constructing a class of partial concepts that is online learnable, but no extension of it to a class of total concepts is online learnable (or even PAC learnable). 
    more » « less
  3. Abstract About 25 years ago, it came to light that a single combinatorial property determines both an important dividing line in model theory (NIP) and machine learning (PAC-learnability). The following years saw a fruitful exchange of ideas between PAC-learning and the model theory of NIP structures. In this article, we point out a new and similar connection between model theory and machine learning, this time developing a correspondence between stability and learnability in various settings of online learning. In particular, this gives many new examples of mathematically interesting classes which are learnable in the online setting. 
    more » « less
  4. Linear temporal logic (LTL) and ω-regular objectives—a su- perset of LTL—have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We in- troduce a model-based probably approximately correct (PAC) learning algorithm for ω-regular objectives in Markov deci- sion processes (MDPs). As part of the development of our algorithm, we introduce the ε-recurrence time: a measure of the speed at which a policy converges to the satisfaction of the ω-regular objective in the limit. We prove that our algo- rithm only requires a polynomial number of samples in the relevant parameters, and perform experiments which confirm our theory. 
    more » « less
  5. null (Ed.)
    A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019). 
    more » « less