skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust Meta-learning for Mixed Linear Regression with Small Batches
A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with k^1/2 examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad even with only a few outliers in the dataset; or even if the medium-sized tasks are slightly smaller with. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off.  more » « less
Award ID(s):
1703574
PAR ID:
10276104
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
33
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    n modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labelled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of k linear regressions, and identify sufficient conditions for such a graceful exchange to hold; there is little loss in sample complexity even when we only have access to small data tasks. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Omega(k^3/2) medium data tasks each with Omega(k^1/2) examples. 
    more » « less
  2. null (Ed.)
    In modern supervised learning, there are a large number of tasks, but many of them are associated with only a small amount of labelled data. These include data from medical image processing and robotic interaction. Even though each individual task cannot be meaningfully trained in isolation, one seeks to meta-learn across the tasks from past experiences by exploiting some similarities. We study a fundamental question of interest: When can abundant tasks with small data compensate for lack of tasks with big data? We focus on a canonical scenario where each task is drawn from a mixture of 𝑘 linear regressions, and identify sufficient conditions for such a graceful exchange to hold; there is little loss in sample complexity even when we only have access to small data tasks. To this end, we introduce a novel spectral approach and show that we can efficiently utilize small data tasks with the help of Ω̃ (𝑘3/2) medium data tasks each with Ω̃ (𝑘1/2) examples. 
    more » « less
  3. null (Ed.)
    Complex service robotics scenarios entail unpredictable task appearance both in space and time. This requires robots to continuously relocate and imposes a trade-off between motion costs and efficiency in task execution. In such scenarios, multi-robot systems and even swarms of robots can be exploited to service different areas in parallel. An efficient deployment needs to continuously determine the best allocation according to the actual service needs, while also taking relocation costs into account when such allocation must be modified. For large scale problems, centrally predicting optimal allocations and movement paths for each robot quickly becomes infeasible. Instead, decentralized solutions are needed that allow the robotic system to self-organize and adaptively respond to the task demands. In this paper, we propose a distributed and asynchronous approach to simultaneous task assignment and path planning for robot swarms, which combines a bio-inspired collective decision-making process for the allocation of robots to areas to be serviced, and a search-based path planning approach for the actual routing of robots towards tasks to be executed. Task allocation exploits a hierarchical representation of the workspace, supporting the robot deployment to the areas that mostly require service. We investigate four realistic environments of increasing complexity, where each task requires a robot to reach a location and work for a specific amount of time. The proposed approach improves over two different baseline algorithms in specific settings with statistical significance, while showing consistently good results overall. Moreover, the proposed solution is robust to limited communication and robot failures. 
    more » « less
  4. Meta-learning methods typically learn tasks under the assumption that all tasks are equally important. However, this assumption is often not valid. In real-world applications, tasks can vary both in their importance during different train- ing stages and in whether they contain noisy labeled data or not, making a uniform approach suboptimal. To address these issues, we propose the Data-Efficient and Robust Task Selection (DERTS) algorithm, which can be incorporated into both gradient and metric-based meta-learning algo- rithms. DERTS selects weighted subsets of tasks from task pools by minimizing the approximation error of the full gra- dient of task pools in the meta-training stage. The selected tasks are efficient for rapid training and robust towards noisy label scenarios. Unlike existing algorithms, DERTS does not require any architecture modification for training and can handle noisy label data in both the support and query sets. Analysis of DERTS shows that the algorithm follows similar training dynamics as learning on the full task pools. Experiments show that DERTS outperforms existing sam- pling strategies for meta-learning on both gradient-based and metric-based meta-learning algorithms in limited data budget and noisy task settings. 
    more » « less
  5. Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give two particular classification tasks where learning a robust classifier is computationally intractable. More precisely we construct two binary classifications task in high dimensional space which are (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (nonrobustly) by a simple linear separator, (iii) yet are not efficiently robustly learnable, even for small perturbations. Specifically, for the first task hardness holds for any efficient algorithm in the statistical query (SQ) model, while for the second task we rule out any efficient algorithm under a cryptographic assumption. These examples give an exponential separation between classical learning and robust learning in the statistical query model or under a cryptographic assumption. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms. 
    more » « less