We propose a modelbased lifelong reinforcementlearning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a samplebased Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuousstate domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent modelbased deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than stateoftheart lifelong RL methods
more »
« less
Provable Lifelong Learning of Representations
In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension d, m tasks with k features up to error ϵ is O~(dk1.5/ϵ+km/ϵ). We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to stateoftheart continual learning methods.
more »
« less
 NSFPAR ID:
 10335021
 Date Published:
 Journal Name:
 AISTATS
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Finding the minimal structural assumptions that empower sampleefficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure—Bellman Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive POMDPs, low Bellman rank problems as well as low Eluder dimension problems. This paper further designs a new optimizationbased algorithm— GOLF, and reanalyzes a hypothesis eliminationbased algorithm—OLIVE (proposed in Jiang et al. (2017)). We prove that both algorithms learn the nearoptimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of stateaction space. Our regret and sample complexity results match or improve the best existing results for several wellknown subclasses of low BE dimension problems.more » « less

We consider learning problems where the training set consists of two types of examples: private and public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. This setting interpolates between private learning (where private) and classical learning (where all examples are public). We study the limits of learning in this setting in terms of private and public sample complexities. We show that any hypothesis class of VCdimension d can be agnostically learned up to an excess error of α using only (roughly) d/α public examples and d/α2 private labeled examples. This result holds even when the public examples are unlabeled. This gives a quadratic improvement over the standard d/α2 upper bound on the public sample complexity (where private examples can be ignored altogether if the public examples are labeled). Furthermore, we give a nearly matching lower bound, which we prove via a generic reduction from this setting to the one of private learning without public data.more » « less

null (Ed.)We give the first polynomialtime algorithm for robust regression in the listdecodable setting where an adversary can corrupt a greater than 1/2 fraction of examples. For any our algorithm takes as input a sample of n linear equations where (1 \alpha) n of the equations are {\em arbitrarily} chosen. It outputs a list that contains a linear function that is close to the truth. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anticoncentrated distribution D. In particular, this gives an efficient algorithm to find an optimal size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anticoncentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that the true linear function has all coordinates of the same magnitude. To complement our result, we prove that the anticoncentration assumption on the inliers is informationtheoretically necessary. Our algorithm is based on a new framework for listdecodable learning that strengthens the `identifiability to algorithms' paradigm based on the sumofsquares method.more » « less

An overarching goal in machine learning is to build a generalizable model with few samples. To this end, overparameterization has been the subject of immense interest to explain the generalization ability of deep nets even when the size of the dataset is smaller than that of the model. While the prior literature focuses on the classical supervised setting, this paper aims to demystify overparameterization for metalearning. Here we have a sequence of linearregression tasks and we ask: (1) Given earlier tasks, what is the optimal linear representation of features for a new downstream task? and (2) How many samples do we need to build this representation? This work shows that surprisingly, overparameterization arises as a natural answer to these fundamental metalearning questions. Specifically, for (1), we first show that learning the optimal representation coincides with the problem of designing a taskaware regularization to promote inductive bias. We leverage this inductive bias to explain how the downstream task actually benefits from overparameterization, in contrast to prior works on fewshot learning. For (2), we develop a theory to explain how feature covariance can implicitly help reduce the sample complexity well below the degrees of freedom and lead to small estimation error. We then integrate these findings to obtain an overall performance guarantee for our metalearning algorithm. Numerical experiments on real and synthetic data verify our insights on overparameterized metalearning.more » « less