Meta-learning has enabled learning statistical models that can be quickly adapted to new prediction tasks. Motivated by use-cases in personalized federated learning, we study the often overlooked aspect of the modern meta-learning algorithms -- their data efficiency. To shed more light on which methods are more efficient, we use techniques from algorithmic stability to derive bounds on the transfer risk that have important practical implications, indicating how much supervision is needed and how it must be allocated for each method to attain the desired level of generalization. Further, we introduce a new simple framework for evaluating meta-learning methods under a limit on the available supervision, conduct an empirical study of MAML, Reptile, and Protonets, and demonstrate the differences in the behavior of these methods on few-shot and federated learning benchmarks. Finally, we propose active meta-learning, which incorporates active data selection into learning-to-learn, leading to better performance of all methods in the limited supervision regime.
more »
« less
Opening the Black Box: Automated Software Analysis for Algorithm Selection
Impressive performance improvements have been achieved in many areas of AI by meta-algorithmic techniques, such as automated algorithm selection and configuration. However, existing techniques treat the target algorithms they are applied to as black boxes – nothing is known about their inner workings. This allows meta-algorithmic techniques to be used broadly, but leaves untapped potential performance improvements enabled by information gained from a deeper analysis of the target algorithms. In this paper, we open the black box without sacrificing universal applicability of meta-algorithmic techniques by automatically analyzing algorithms. We show how to use this information to perform algorithm selection, and demonstrate improved performance compared to previous approaches that treat algorithms as black boxes.
more »
« less
- Award ID(s):
- 1813537
- PAR ID:
- 10470674
- Publisher / Repository:
- Proceedings of the First International Conference on Automated Machine Learning, PMLR 188:6
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Ultrathin meta-optics offer unmatched, multifunctional control of light. Next-generation optical technologies, however, demand unprecedented performance. This will likely require design algorithms surpassing the capability of human intuition. For the adjoint method, this requires explicitly deriving gradients, which is sometimes challenging for certain photonics problems. Existing techniques also comprise a patchwork of application-specific algorithms, each focused in scope and scatterer type. Here, we leverage algorithmic differentiation as used in artificial neural networks, treating photonic design parameters as trainable weights, optical sources as inputs, and encapsulating device performance in the loss function. By solving a complex, degenerate eigenproblem and formulating rigorous coupled-wave analysis as a computational graph, we support both arbitrary, parameterized scatterers and topology optimization. With iteration times below the cost of two forward simulations typical of adjoint methods, we generate multilayer, multifunctional, and aperiodic meta-optics. As an open-source platform adaptable to other algorithms and problems, we enable fast and flexible meta-optical design.more » « less
-
Algorithms to solve hard combinatorial problems often exhibit complementary performance, i.e. where one algorithm fails, another shines. Algorithm portfolios and algorithm selection take advantage of this by running all algorithms in parallel or choosing the best one to run on a problem instance. In this paper, we show that neither of these approaches gives the best possible performance and propose the happy medium of running a subset of all algorithms in parallel. We propose a method to choose this subset automatically for each problem instance, and demonstrate empirical improvements of up to 19% in terms of runtime, 81% in terms of misclassification penalty, and 26% in terms of penalized averaged runtime on scenarios from the ASlib benchmark library. Unlike all other algorithm selection and scheduling approaches in the literature, our performance measures are based on the actual performance for algorithms running in parallel rather than assuming overhead-free parallelization based on sequential performance. Our approach is easy to apply in practice and does not require to solve hard problems to obtain a schedule, unlike other techniques in the literature, while still delivering superior performance.more » « less
-
We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from n auxiliary tasks. This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning. The key challenge of this problem is that not all auxiliary tasks are useful to improve the performance of the target task. Thus, choosing the right subset of auxiliary tasks is crucial. Conventional subset selection methods, such as forward & backward selection, are unsuitable for LM fine-tuning because they require repeated training on subsets of auxiliary tasks. This paper introduces a new algorithm to estimate model fine-tuning performances without repeated training. Our algorithm first performs multitask training using the data of all the tasks to obtain a meta initialization. Then, we approximate the model fine-tuning loss of a subset using functional values and gradients from the meta initialization. Empirically, we find that this gradient-based approximation holds with remarkable accuracy for twelve transformer-based LMs. Thus, we can now estimate fine-tuning performances on CPUs within a few seconds. We conduct extensive experiments to validate our approach, delivering a speedup of 30× over conventional subset selection while incurring only 1% error of the true fine-tuning performances. In downstream evaluations of instruction tuning and chain-of-thought fine-tuning, our approach improves over prior methods that utilize gradient or representation similarity for subset selection by up to 3.8%.more » « less
-
Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.more » « less
An official website of the United States government

