skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Provable Lifelong Learning of Representations
In lifelong learning, tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension d, m tasks with k features up to error ϵ is O~(dk1.5/ϵ+km/ϵ). We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. We complement our analysis with an empirical study, including a heuristic lifelong learning algorithm for deep neural networks. Our method performs favorably on challenging realistic image datasets compared to state-of-the-art continual learning methods.  more » « less
Award ID(s):
2106444 2134105 2007443
PAR ID:
10335021
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
AISTATS
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study how representation learning can im- prove the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d si- multaneously, and these T bandit tasks collec- tively share a common linear representation with a dimensionality of r ≪ d. We present a new algorithm based on alternating projected gradi- ent descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the pro- posed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithms 
    more » « less
  2. We propose a model-based lifelong reinforcement-learning approach that estimates a hierarchical Bayesian posterior distilling the common structure shared across different tasks. The learned posterior combined with a sample-based Bayesian exploration procedure increases the sample efficiency of learning across a family of related tasks. We first derive an analysis of the relationship between the sample complexity and the initialization quality of the posterior in the finite MDP setting. We next scale the approach to continuous-state domains by introducing a Variational Bayesian Lifelong Reinforcement Learning algorithm that can be combined with recent model-based deep RL methods, and that exhibits backward transfer. Experimental results on several challenging domains show that our algorithms achieve both better forward and backward transfer performance than state-of-the-art lifelong RL methods 
    more » « less
  3. We study the problem of PAC learning γ-margin halfspaces with Massart noise. We propose a simple proper learning algorithm, the Perspectron, that has sample complexity O˜((ϵγ)−2) and achieves classification error at most η+ϵ where η is the Massart noise rate. Prior works [DGT19,CKMY20] came with worse sample complexity guarantees (in both ϵ and γ) or could only handle random classification noise [DDK+23,KIT+23] -- a much milder noise assumption. We also show that our results extend to the more challenging setting of learning generalized linear models with a known link function under Massart noise, achieving a similar sample complexity to the halfspace case. This significantly improves upon the prior state-of-the-art in this setting due to [CKMY20], who introduced this model. 
    more » « less
  4. Finding the minimal structural assumptions that empower sample-efficient learning is one of the most important research directions in Reinforcement Learning (RL). This paper advances our understanding of this fundamental question by introducing a new complexity measure—Bellman Eluder (BE) dimension. We show that the family of RL problems of low BE dimension is remarkably rich, which subsumes a vast majority of existing tractable RL problems including but not limited to tabular MDPs, linear MDPs, reactive POMDPs, low Bellman rank problems as well as low Eluder dimension problems. This paper further designs a new optimization-based algorithm— GOLF, and reanalyzes a hypothesis elimination-based algorithm—OLIVE (proposed in Jiang et al. (2017)). We prove that both algorithms learn the near-optimal policies of low BE dimension problems in a number of samples that is polynomial in all relevant parameters, but independent of the size of state-action space. Our regret and sample complexity results match or improve the best existing results for several well-known subclasses of low BE dimension problems. 
    more » « less
  5. We consider learning problems where the training set consists of two types of examples: private and public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. This setting interpolates between private learning (where private) and classical learning (where all examples are public). We study the limits of learning in this setting in terms of private and public sample complexities. We show that any hypothesis class of VC-dimension d can be agnostically learned up to an excess error of α using only (roughly) d/α public examples and d/α2 private labeled examples. This result holds even when the public examples are unlabeled. This gives a quadratic improvement over the standard d/α2 upper bound on the public sample complexity (where private examples can be ignored altogether if the public examples are labeled). Furthermore, we give a nearly matching lower bound, which we prove via a generic reduction from this setting to the one of private learning without public data. 
    more » « less