NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning

Raje, Arian; Askin, Baris; Jhunjhunwala, Divyansh; Joshi, Gauri (December 2025, The Thirty-ninth Annual Conference on Neural Information Processing Systems)

Free, publicly-accessible full text available December 5, 2026
Erasure Coded Neural Network Inference via Fisher Averaging

https://doi.org/10.1109/ISIT57864.2024.10619514

Jhunjhunwala, Divyansh; Jali, Neharika; Joshi, Gauri; Wang, Shiqiang (July 2024, IEEE)

Full Text Available
FedFisher: Leveraging Fisher Information for One-Shot Federated Learning

Jhunjhunwala, Divyansh; Wang, Shiqiang; Joshi, Gauri (May 2024, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics)

Standard federated learning (FL) algorithms typically require multiple rounds of communication between the server and the clients, which has several drawbacks, including requiring constant network connectivity, repeated investment of computational resources, and susceptibility to privacy attacks. One-Shot FL is a new paradigm that aims to address this challenge by enabling the server to train a global model in a single round of communication. In this work, we present FedFisher, a novel algorithm for one-shot FL that makes use of Fisher information matrices computed on local client models, motivated by a Bayesian perspective of FL. First, we theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and show that the error of our one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases. Next, we propose practical variants of FedFisher using the diagonal Fisher and K-FAC approximation for the full Fisher and highlight their communication and compute efficiency for FL. Finally, we conduct extensive experiments on various datasets, which show that these variants of FedFisher consistently improve over competing baselines.
more » « less
Full Text Available
Maximizing Global Model Appeal in Federated Learning

Cho, Yae Jee; Jhunjhunwala, Divyansh; Li, Tian; Smith, Virginia; Joshi, Gauri (March 2024, Transactions on machine learning research)
Bellet, Aurelien (Ed.)
Federated learning (FL) aims to collaboratively train a global model using local data from a network of clients. To warrant collaborative training, each federated client may expect the resulting global model to satisfy some individual requirement, such as achieving a certain loss threshold on their local data. However, in real FL scenarios, the global model may not satisfy the requirements of all clients in the network due to the data heterogeneity across clients. In this work, we explore the problem of global model appeal in FL, which we define as the total number of clients that find that the global model satisfies their individual requirements. We discover that global models trained using traditional FL approaches can result in a significant number of clients unsatisfied with the model based on their local requirements. As a consequence, we show that global model appeal can directly impact how clients participate in training and how the model performs on new clients at inference time. Our work proposes MaxFL, which maximizes the number of clients that find the global model appealing. MaxFL achieves a 22-40% and 18-50% improvement in the test accuracy of training clients and (unseen) test clients respectively, compared to a wide range of FL approaches that tackle data heterogeneity, aim to incentivize clients, and learn personalized/fair models.
more » « less
Full Text Available
FedExp: Speeding Up Federated Averaging via Extrapolation

Jhunjhunwala, Divyansh; Wang, Shiqiang; Joshi, Gauri (January 2023, International Conference on Learning Representations)

Federated Averaging (FedAvg) remains the most popular algorithm for Federated Learning (FL) optimization due to its simple implementation, stateless nature, and privacy guarantees combined with secure aggregation. Recent work has sought to generalize the vanilla averaging in FedAvg to a generalized gradient descent step by treating client updates as pseudo-gradients and using a server step size. While the use of a server step size has been shown to provide performance improvement theoretically, the practical benefit of the server step size has not been seen in most existing works. In this work, we present FedExP, a method to adaptively determine the server step size in FL based on dynamically varying pseudo-gradients throughout the FL process. We begin by considering the overparameterized convex regime, where we reveal an interesting similarity between FedAvg and the Projection Onto Convex Sets (POCS) algorithm. We then show how FedExP can be motivated as a novel extension to the extrapolation mechanism that is used to speed up POCS. Our theoretical analysis later also discusses the implications of FedExP in underparameterized and non-convex settings. Experimental results show that FedExP consistently converges faster than FedAvg and competing baselines on a range of realistic FL datasets.
more » « less
Full Text Available
FedVARP: Tackling the variance due to partial client participation in federated learning

Jhunjhunwala, Divyansh; Sharma, Pranay; Nagarkatti, Aushim; Joshi, Gauri (July 2022, Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence)

Data-heterogeneous federated learning (FL) systems suffer from two significant sources of convergence error: 1) client drift error caused by performing multiple local optimization steps at clients, and 2) partial client participation error caused by the fact that only a small subset of the edge clients participate in every training round. We find that among these, only the former has received significant attention in the literature. To remedy this, we propose FedVARP, a novel variance reduction algorithm applied at the server that eliminates error due to partial client participation. To do so, the server simply maintains in memory the most recent update for each client and uses these as surrogate updates for the non-participating clients in every round. Further, to alleviate the memory requirement at the server, we propose a novel clustering-based variance reduction algorithm ClusterFedVARP. Unlike previously proposed methods, both FedVARP and ClusterFedVARP do not require additional computation at clients or communication of additional optimization parameters. Through extensive experiments, we show that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVARP with much less memory requirements.
more » « less
Full Text Available
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation

Jhunjhunwala, Divyansh; Mallick, Ankur; Gadhikar, Advait; Kadhe, Swanand; Joshi, Gauri (December 2021, Advances in neural information processing systems)

We study the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node). When the vectors are high-dimensional, the communication cost of sending entire vectors may be prohibitive, and it may be imperative for them to use sparsification techniques. While most existing work on sparsified mean estimation is agnostic to the characteristics of the data vectors, in many practical applications such as federated learning, there may be spatial correlations (similarities in the vectors sent by different nodes) or temporal correlations (similarities in the data sent by a single node over different iterations of the algorithm) in the data vectors. We leverage these correlations by simply modifying the decoding method used by the server to estimate the mean. We provide an analysis of the resulting estimation error as well as experiments for PCA, K-Means and Logistic Regression, which show that our estimators consistently outperform more sophisticated and expensive sparsification methods.
more » « less
Full Text Available
Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning

https://doi.org/10.1109/ICASSP39728.2021.9413697

Jhunjhunwala, Divyansh; Gadhikar, Advait; Joshi, Gauri; Eldar, Yonina C. (June 2021, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of the stochastic gradients. In this work, we propose an adaptive quantization strategy called AdaQuantFL that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during the course of training. Experiments on training deep neural networks show that our method can converge in much fewer communicated bits as compared to fixed quantization level setups, with little or no impact on training and test accuracy.
more » « less
Full Text Available

Search for: All records