skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Meta-learning for Mixed Linear Regression
n modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labelled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of k linear regressions, and identify sufficient conditions for such a graceful exchange to hold; there is little loss in sample complexity even when we only have access to small data tasks. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Omega(k^3/2) medium data tasks each with Omega(k^1/2) examples.  more » « less
Award ID(s):
1637360 1740551 1703574
PAR ID:
10203962
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labelled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of 𝑘 linear regressions, and identify sufficient conditions for such a graceful exchange to hold; there is little loss in sample complexity even when we only have access to small data tasks. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Ω̃ (𝑘3/2) medium data tasks each with Ω̃ (𝑘1/2) examples. 
    more » « less
  2. A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with k^1/2 examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad even with only a few outliers in the dataset; or even if the medium-sized tasks are slightly smaller with. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off. 
    more » « less
  3. null (Ed.)
    This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the \emph{task type vector}, that affects the task processing options and also affects the resulting reward and time duration of the task. The probability distribution for the task type vector is unknown and the controller must learn to make efficient decisions so that time average reward converges to optimality. Prior work on such renewal optimization problems leaves open the question of optimal convergence time. This paper develops an algorithm with an optimality gap that decays like $$O(1/\sqrt{k})$$, where $$k$$ is the number of tasks processed. The same algorithm is shown to have faster $$O(\log(k)/k)$$ performance when the system satisfies a strong concavity property. The proposed algorithm uses an auxiliary variable that is updated according to a classic Robbins-Monro iteration. It makes online scheduling decisions at the start of each renewal frame based on this variable and on the observed task type. A matching converse is obtained for the strongly concave case by constructing an example system for which all algorithms have performance at best $$\Omega(\log(k)/k)$$. A matching $$\Omega(1/\sqrt{k})$$ converse is also shown for the general case without strong concavity. 
    more » « less
  4. We show that linear superpositions of plane waves involving a single-valued, covariantly stable dispersion relation $$\omega(k)$$ always propagate outside the lightcone, unless $$\omega(k) =a+b k$$. This implies that there is no notion of causality for individual dispersion relations, since no mathematical condition on the function $$\omega(k)$$ (such as the front velocity or the asymptotic group velocity conditions) can serve as a sufficient condition for subluminal propagation in dispersive media. Instead, causality can only emerge from a careful cancellation that occurs when one superimposes all the excitation branches of a physical model. This is shown to happen automatically in local theories of matter that are covariantly stable. 
    more » « less
  5. arXiv:2401.07170v1 (Ed.)
    This paper considers online optimization for a system that performs a sequence of back-to-back tasks. Each task can be processed in one of multiple processing modes that affect the duration of the task, the reward earned, and an additional vector of penalties (such as energy or cost). Let A[k] be a random matrix of parameters that specifies the duration, reward, and penalty vector under each processing option for task k. The goal is to observe A[k] at the start of each new task k and then choose a processing mode for the task so that, over time, time average reward is maximized subject to time average penalty constraints. This is a renewal optimization problem and is challenging because the probability distribution for the A[k] sequence is unknown. Prior work shows that any algorithm that comes within ϵ of optimality must have (1/ϵ^2) convergence time. The only known algorithm that can meet this bound operates without time average penalty constraints and uses a diminishing stepsize that cannot adapt when probabilities change. This paper develops a new algorithm that is adaptive and comes within O(ϵ) of optimality for any interval of  (1/ϵ^2) tasks over which probabilities are held fixed, regardless of probabilities before the start of the interval. 
    more » « less