NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On Noisy Evaluation in Federated Hyperparameter Tuning

Kuo, K.; Thaker, P.; Khodak, M.; Ngyuen, J.; Jiang, D.; Talwalkar, A.; Smith, V. (June 2023, Conference on Machine Learning and Systems)

Full Text Available
Diverse client selection for federated learning via submodular maximization

Balakrishnan, Ravikumar; Li, Tian; Zhou, Tianyi; Himayat, Nageen; Smith, Virginia; Bilmes, Jeff (April 2022, International Conference on Learning Representations)

Full Text Available
Ditto: Fair and Robust Federated Learning Through Personalization

Li, T.; Hu, S.; Beirami, A.; Smith, V. (January 2021, International Conference on Machine Learning)

Fairness and robustness are two important concerns for federated learning systems. In this work, we identify that robustness to data and model poisoning attacks and fairness, measured as the uniformity of performance across devices, are competing constraints in statistically heterogeneous networks. To address these constraints, we propose employing a simple, general framework for personalized federated learning, Ditto, and develop a scalable solver for it. Theoretically, we analyze the ability of Ditto to achieve fairness and robustness simultaneously on a class of linear problems. Empirically, across a suite of federated datasets, we show that Ditto not only achieves competitive performance relative to recent personalization methods, but also enables more accurate, robust, and fair models relative to state-of-the-art fair or robust baselines.
more » « less
Full Text Available
Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing

Khodak, M.; Tu, R.; Li, T.; Li, L.; Balcan, M.-F.; Smith, V.; Talwalkar, A. (January 2021, Advances in neural information processing systems)

Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks, obtaining higher accuracy using the same training budget.
more » « less
Full Text Available
Heterogeneity for the Win: One-Shot Federated Clustering

Dennis, D.; Li, T.; Smith, V. (January 2021, International Conference on Machine Learning)

In this work, we explore the unique challenges---and opportunities---of unsupervised federated learning (FL). We develop and analyze a one-shot federated clustering scheme, k-FED, based on the widely-used Lloyd's method for k-means clustering. In contrast to many supervised problems, we show that the issue of statistical heterogeneity in federated networks can in fact benefit our analysis. We analyse k-FED under a center separation assumption and compare it to the best known requirements of its centralized counterpart. Our analysis shows that in heterogeneous regimes where the number of clusters per device (k') is smaller than the total number of clusters over the network k, ($$k' \le \sqrt{k}$$), we can use heterogeneity to our advantage---significantly weakening the cluster separation requirements for k-FED. From a practical viewpoint, k-FED also has many desirable properties: it requires only round of communication, can run asynchronously, and can handle partial participation or node/network failures. We motivate our analysis with experiments on common FL benchmarks, and highlight the practical utility of one-shot clustering through use-cases in personalized FL and device sampling.
more » « less
Full Text Available
On Data Efficiency of Meta-learning

Al-Shedivat, M.; Li, L.; Xing, E.; Talwalkar, A. (January 2021, International Conference on Artificial Intelligence and Statistics)

Meta-learning has enabled learning statistical models that can be quickly adapted to new prediction tasks. Motivated by use-cases in personalized federated learning, we study the often overlooked aspect of the modern meta-learning algorithms -- their data efficiency. To shed more light on which methods are more efficient, we use techniques from algorithmic stability to derive bounds on the transfer risk that have important practical implications, indicating how much supervision is needed and how it must be allocated for each method to attain the desired level of generalization. Further, we introduce a new simple framework for evaluating meta-learning methods under a limit on the available supervision, conduct an empirical study of MAML, Reptile, and Protonets, and demonstrate the differences in the behavior of these methods on few-shot and federated learning benchmarks. Finally, we propose active meta-learning, which incorporates active data selection into learning-to-learn, leading to better performance of all methods in the limited supervision regime.
more » « less
Full Text Available
Two Sides of Meta-Learning Evaluation: In vs. Out of Distribution

Setlur, A.; Li, O.; Smith, V. (January 2021, Advances in neural information processing systems)

We categorize meta-learning evaluation into two settings: in-distribution [ID], in which the train and test tasks are sampled iid from the same underlying task distribution, and out-of-distribution [OOD], in which they are not. While most meta-learning theory and some FSL applications follow the ID setting, we identify that most existing few-shot classification benchmarks instead reflect OOD evaluation, as they use disjoint sets of train (base) and test (novel) classes for task generation. This discrepancy is problematic because -- as we show on numerous benchmarks -- meta-learning methods that perform better on existing OOD datasets may perform significantly worse in the ID setting. In addition, in the OOD setting, even though current FSL benchmarks seem befitting, our study highlights concerns in 1) reliably performing model selection for a given meta-learning method, and 2) consistently comparing the performance of different methods. To address these concerns, we provide suggestions on how to construct FSL benchmarks to allow for ID evaluation as well as more reliable OOD evaluation. Our work aims to inform the meta-learning community about the importance and distinction of ID vs. OOD evaluation, as well as the subtleties of OOD evaluation with current benchmarks.
more » « less
Full Text Available
On Large-Cohort Training for Federated Learning

Charles, Z.; Garrett, Z.; Huo, Z.; Shmulyian, S.; Smith, V. (January 2021, Advances in neural information processing systems)

Federated learning methods typically learn a model by iteratively sampling updates from a population of clients. In this work, we explore how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms. Our work poses three fundamental questions. First, what challenges arise when trying to scale federated learning to larger cohorts? Second, what parallels exist between cohort sizes in federated learning and batch sizes in centralized learning? Last, how can we design federated learning methods that effectively utilize larger cohort sizes? We give partial answers to these questions based on extensive empirical evaluation. Our work highlights a number of challenges stemming from the use of larger cohorts. While some of these (such as generalization issues and diminishing returns) are analogs of large-batch training challenges, others (including training failures and fairness concerns) are unique to federated learning.
more » « less
Full Text Available
Adaptive Gradient-Based Meta-Learning Methods

Khodak, M.; Balcan, M.; Talwalkar, A. (January 2019, Neural Information Processing Systems)

Full Text Available
FedDane: A Federated Newton-Type Method

Li, T.; Sahu, A.; Sanjabi, M.; Zaheer, M.; Talwalkar, A.; Smith, V. (January 2019, Asilomar Conference on Signals, Systems and Computers)

Full Text Available

« Prev Next »

Search for: All records