In reinforcement learning, the classic objectives of maximizing discounted
and finite-horizon cumulative rewards are PAC-learnable: There are algorithms
that learn a near-optimal policy with high probability using a finite amount of
samples and computation. In recent years, researchers have introduced
objectives and corresponding reinforcement-learning algorithms beyond the
classic cumulative rewards, such as objectives specified as linear temporal
logic formulas. However, questions about the PAC-learnability of these new
objectives have remained open.
This work demonstrates the PAC-learnability of general reinforcement-learning
objectives through sufficient conditions for PAC-learnability in two analysis
settings. In particular, for the analysis that considers only sample
complexity, we prove that if an objective given as an oracle is uniformly
continuous, then it is PAC-learnable. Further, for the analysis that considers
computational complexity, we prove that if an objective is computable, then it
is PAC-learnable. In other words, if a procedure computes successive
approximations of the objective's value, then the objective is PAC-learnable.
We give three applications of our condition on objectives from the literature
with previously unknown PAC-learnability and prove that these objectives are
PAC-learnable. Overall, our result helps verify existing objectives'
PAC-learnability. Also, as some studied objectives that are not uniformly
continuous have been shown to be not PAC-learnable, our results could guide the
design of new PAC-learnable objectives.
more »
« less
Online Learning and Disambiguations of Partial Concept Classes
In a recent article, Alon, Hanneke, Holzman, and Moran (FOCS '21) introduced a unifying framework to study the learnability of classes of partial concepts. One of the central questions studied in their work is whether the learnability of a partial concept class is always inherited from the learnability of some "extension" of it to a total concept class.
They showed this is not the case for PAC learning but left the problem open for the stronger notion of online learnability.
We resolve this problem by constructing a class of partial concepts that is online learnable, but no extension of it to a class of total concepts is online learnable (or even PAC learnable).
more »
« less
- Award ID(s):
- 1947546
- NSF-PAR ID:
- 10488557
- Editor(s):
- Etessami, Kousha; Feige, Uriel; Puppis, Gabriele
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Journal Name:
- 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
- ISSN:
- 1868-8969
- ISBN:
- 978-3-95977-278-5
- Subject(s) / Keyword(s):
- Online learning Littlestone dimension VC dimension partial concept class clique vs independent set Alon-Saks-Seymour conjecture Standard Optimal Algorithm PAC learning Theory of computation → Online learning theory
- Format(s):
- Medium: X
- Location:
- Dagstuhl, Germany
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract About 25 years ago, it came to light that a single combinatorial property determines both an important dividing line in model theory (NIP) and machine learning (PAC-learnability). The following years saw a fruitful exchange of ideas between PAC-learning and the model theory of NIP structures. In this article, we point out a new and similar connection between model theory and machine learning, this time developing a correspondence between stability and learnability in various settings of online learning. In particular, this gives many new examples of mathematically interesting classes which are learnable in the online setting.more » « less
-
How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its “learning curve”, that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.more » « less
-
null (Ed.)A recent line of work has shown a qualitative equivalence between differentially private PAC learning and online learning: A concept class is privately learnable if and only if it is online learnable with a finite mistake bound. However, both directions of this equivalence incur significant losses in both sample and computational efficiency. Studying a special case of this connection, Gonen, Hazan, and Moran (NeurIPS 2019) showed that uniform or highly sample-efficient pure-private learners can be time-efficiently compiled into online learners. We show that, assuming the existence of one-way functions, such an efficient conversion is impossible even for general pure-private learners with polynomial sample complexity. This resolves a question of Neel, Roth, and Wu (FOCS 2019).more » « less