skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FedVARP: Tackling the variance due to partial client participation in federated learning
Data-heterogeneous federated learning (FL) systems suffer from two significant sources of convergence error: 1) client drift error caused by performing multiple local optimization steps at clients, and 2) partial client participation error caused by the fact that only a small subset of the edge clients participate in every training round. We find that among these, only the former has received significant attention in the literature. To remedy this, we propose FedVARP, a novel variance reduction algorithm applied at the server that eliminates error due to partial client participation. To do so, the server simply maintains in memory the most recent update for each client and uses these as surrogate updates for the non-participating clients in every round. Further, to alleviate the memory requirement at the server, we propose a novel clustering-based variance reduction algorithm ClusterFedVARP. Unlike previously proposed methods, both FedVARP and ClusterFedVARP do not require additional computation at clients or communication of additional optimization parameters. Through extensive experiments, we show that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVARP with much less memory requirements.  more » « less
Award ID(s):
2045694
PAR ID:
10389233
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Federated bilevel optimization (FBO) has garnered significant attention lately, driven by its promising applications in meta-learning and hyperparameter optimization. Existing algorithms generally aim to approximate the gradient of the upper-level objective function (hypergradient) in the federated setting. However, because of the nonlinearity of the hypergradient and client drift, they often involve complicated computations. These computations, like multiple optimization sub-loops and second-order derivative evaluations, end up with significant memory consumption and high computational costs. In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. MemFBO features a fully single-loop structure with all involved variables updated simultaneously, and uses only first-order gradient information for all local updates. We show that MemFBO exhibits a linear convergence speedup with milder assumptions in both partial and full client participation scenarios. We further implement MemFBO in a novel FBO application for federated data cleaning. Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm. 
    more » « less
  2. Federated bilevel optimization (FBO) has garnered significant attention lately, driven by its promising applications in meta-learning and hyperparameter optimization. Existing algorithms generally aim to approximate the gradient of the upper-level objective function (hypergradient) in the federated setting. However, because of the nonlinearity of the hypergradient and client drift, they often involve complicated computations. These computations, like multiple optimization sub-loops and second-order derivative evaluations, end up with significant memory consumption and high computational costs. In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. MemFBO features a fully single-loop structure with all involved variables updated simultaneously, and uses only first-order gradient information for all local updates. We show that MemFBO exhibits a linear convergence speedup with milder assumptions in both partial and full client participation scenarios. We further implement MemFBO in a novel FBO application for federated data cleaning. Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm. 
    more » « less
  3. Minimax optimization has seen a surge in interest with the advent of modern applications such as GANs, and it is inherently more challenging than simple minimization. The difficulty is exacerbated by the training data residing at multiple edge devices or clients, especially when these clients can have heterogeneous datasets and heterogeneous local computation capabilities. We propose a general federated minimax optimization framework that subsumes such settings and several existing methods like Local SGDA. We show that naive aggregation of model updates made by clients running unequal number of local steps can result in optimizing a mismatched objective function – a phenomenon previously observed in standard federated minimization. To fix this problem, we propose normalizing the client updates by the number of local steps. We analyze the convergence of the proposed algorithm for classes of nonconvex-concave and nonconvex-nonconcave functions and characterize the impact of heterogeneous client data, partial client participation, and heterogeneous local computations. For all the function classes considered, we significantly improve the existing computation and communication complexity results. Experimental results support our theoretical claims. 
    more » « less
  4. Federated bilevel learning has received increasing attention in various emerging machine learning and communication applications. Recently, several Hessian-vector-based algorithms have been proposed to solve the federated bilevel optimization problem. However, several important properties in federated learning such as the partial client participation and the linear speedup for convergence (i.e., the convergence rate and complexity are improved linearly with respect to the number of sampled clients) in the presence of non-i.i.d.~datasets, still remain open. In this paper, we fill these gaps by proposing a new federated bilevel algorithm named FedMBO with a novel client sampling scheme in the federated hypergradient estimation. We show that FedMBO achieves a convergence rate of $$\mathcal{O}\big(\frac{1}{\sqrt{nK}}+\frac{1}{K}+\frac{\sqrt{n}}{K^{3/2}}\big)$$ on non-i.i.d.~datasets, where $$n$$ is the number of participating clients in each round, and $$K$$ is the total number of iteration. This is the first theoretical linear speedup result for non-i.i.d.~federated bilevel optimization. Extensive experiments validate our theoretical results and demonstrate the effectiveness of our proposed method. 
    more » « less
  5. Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Previous works analyzed the convergence of federated learning by accounting for data heterogeneity, communication/computation limitations, and partial client participation. However, most assume unbiased client participation, where clients are selected such that the aggregated model update is unbiased. In our work, we present the convergence analysis of federated learning with biased client selection and quantify how the bias affects convergence speed. We show that biasing client selection towards clients with higher local loss yields faster error convergence. From this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that flexibly spans the trade-off between convergence speed and solution bias. Extensive experiments demonstrate that Power-of-Choice can converge up to 3 times faster and give 10% higher test accuracy than the baseline random selection. 
    more » « less