Modern machine learning models require a large amount of labeled data for training to perform well. A recently emerging paradigm for reducing the reliance of large model training on massive labeled data is to take advantage of abundantly available labeled data from a related source task to boost the performance of the model in a desired target task where there may not be a lot of data available. This approach, which is called transfer learning, has been applied successfully in many application domains. However, despite the fact that many transfer learning algorithms have been developed, the fundamental understanding of "when" and "to what extent" transfer learning can reduce sample complexity is still limited. In this work, we take a step towards foundational understanding of transfer learning by focusing on binary classification with linear models and Gaussian features and develop statistical minimax lower bounds in terms of the number of source and target samples and an appropriate notion of similarity between source and target tasks. To derive this bound, we reduce the transfer learning problem to hypothesis testing via constructing a packing set of source and target parameters by exploiting Gilbert-Varshamov bound, which in turn leads to a lower bound on sample complexity. We also evaluate our theoretical results by experiments on real data sets.
more »
« less
A Class of Geometric Structures in Transfer Learning: Minimax Bounds and Optimality
We study the problem of transfer learning, observing that previous efforts to understand its information-theoretic limits do not fully exploit the geometric structure of the source and target domains. In contrast, our study first illustrates the benefits of incorporating a natural geometric structure within a linear regression model, which corresponds to the generalized eigenvalue problem formed by the Gram matrices of both domains. We next establish a finite-sample minimax lower bound, propose a refined model interpolation estimator that enjoys a matching upper bound, and then extend our framework to multiple source domains and generalized linear models. Surprisingly, as long as information is available on the distance between the source and target parameters, negative-transfer does not occur. Simulation studies show that our proposed interpolation estimator outperforms state-of-the-art transfer learning methods in both moderate- and high-dimensional settings.
more »
« less
- Award ID(s):
- 2118199
- PAR ID:
- 10413184
- Date Published:
- Journal Name:
- Proceedings of the International Workshop on Artificial Intelligence and Statistics
- Volume:
- 151
- Issue:
- 2022
- ISSN:
- 1525-531X
- Page Range / eLocation ID:
- 3794-3820
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Transfer learning has emerged as a powerful technique for improving the performance of machine learning models on new domains where labeled training data may be scarce. In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data. Despite recent empirical success of transfer learning approaches, the benefits and fundamental limits of transfer learning are poorly understood. In this paper we develop a statistical minimax framework to characterize the fundamental limits of transfer learning in the context of regression with linear and one-hidden layer neural network models. Specifically, we derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data as well as appropriate notions of similarity between the source and target tasks. Our lower bound provides new insights into the benefits and limitations of transfer learning. We further corroborate our theoretical finding with various experiments.more » « less
-
Summary We present new models and methods for the posterior drift problem where the regression function in the target domain is modelled as a linear adjustment, on an appropriate scale, of that in the source domain, and study the theoretical properties of our proposed estimators in the binary classification problem. The core idea of our model inherits the simplicity and the usefulness of generalized linear models and accelerated failure time models from the classical statistics literature. Our approach is shown to be flexible and applicable in a variety of statistical settings, and can be adopted for transfer learning problems in various domains including epidemiology, genetics and biomedicine. As concrete applications, we illustrate the power of our approach (i) through mortality prediction for British Asians by borrowing strength from similar data from the larger pool of British Caucasians, using the UK Biobank data, and (ii) in overcoming a spurious correlation present in the source domain of the Waterbirds dataset.more » « less
-
This study focuses on developing and examining the effectiveness of Transfer Learning (TL) for structural health monitoring (SHM) systems that transfer knowledge about damage states from one structure (i.e., the source domain) to another structure (i.e., the target domain). Transfer Learning (TL) is an efficient method for knowledge transfer and mapping from source to target domains. In addition, Proper Orthogonal Modes (POMs), which help classify behavior and health, provide a promising tool for damage identification in structural systems. Previous investigations show that damage intensity and location are highly correlated with POM variations for structures under unknown loads. To train damage identification algorithms based on POMs and ML, one generally needs to use multiple simulations to generate damage scenarios. The developed process is applied to a simply supported truss span in a multi-span railway bridge. TL is first used to obtain relationships between POMs for two modeled bridges: one being a source model (i.e., labeled) and the other being the target modeled bridge (i.e., unlabeled). This technique is then implemented to develop POMs for a damaged, unknown target using TL that links source and target POMs. It is shown that the trained knowledge from one bridge was effectively generalized to other, somewhat similar, bridges in the population.more » « less
-
Due to the ability of deep neural nets to learn rich representations, recent advances in unsupervised domain adaptation have focused on learning domain-invariant features that achieve a small error on the source domain. The hope is that the learnt representation, together with the hypothesis learnt from the source domain, can generalize to the target domain. In this paper, we first construct a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation. In particular, the counterexample exhibits conditional shift: the class-conditional distributions of input features change between source and target domains. To give a sufficient condition for domain adaptation, we propose a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift.Moreover, we shed new light on the problem by proving an information-theoretic lower bound on the joint error of any domain adaptation method that attempts to learn invariant representations.Our result characterizes a fundamental tradeoff between learning invariant representations and achieving small joint error on both domains when the marginal label distributions differ from source to target. Finally, we conduct experiments on real-world datasets that corroborate our theoretical findings. We believe these insights are helpful in guiding the future design of domain adaptation and representation learning algorithms.more » « less
An official website of the United States government

