skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning
Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of the stochastic gradients. In this work, we propose an adaptive quantization strategy called AdaQuantFL that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during the course of training. Experiments on training deep neural networks show that our method can converge in much fewer communicated bits as compared to fixed quantization level setups, with little or no impact on training and test accuracy.  more » « less
Award ID(s):
1850029
PAR ID:
10329038
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Page Range / eLocation ID:
3110 to 3114
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Large-scale machine learning training, in particular, distributed stochastic gradient descent, needs to be robust to inherent system variability such as node straggling and random communication delays. This work considers a distributed training framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. We analyze the true speed of error convergence with respect to wall-clock time (instead of the number of iterations) and analyze how it is affected by the frequency of averaging. The main contribution is the design of ADACOMM, an adaptive communication strategy that starts with infrequent averaging to save communication delay and improve convergence speed, and then increases the communication frequency in order to achieve a low error floor. Rigorous experiments on training deep neural networks show that ADACOMM can take 3x less time than fully synchronous SGD and still reach the same final training loss. 
    more » « less
  2. In this paper, we propose and analyze SPARQSGD, an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models over a graph. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD, each node first takes a fixed number of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that SPARQ-SGD converges as O(1/nT ) and O(1/√nT ) in the strongly-convex and non-convex settings, respectively, demonstrating that aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for `free'. We evaluate SPARQ-SGD over real datasets to demonstrate significant savings in communication bits over the state-of-the-art. 
    more » « less
  3. We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection. 
    more » « less
  4. List Viterbi decoders are a very effective way to improve the performance of block codes in combination with an error detection outer code. In this work, we combine an efficient serial list Viterbi decoder design with an existing serially concatenated, convolutionally-encoded, pulse position modulated code (SCPPM) used in space communication, that exhibits poor performance because of an error floor. The SCPPM code features a 32-bit CRC that provides powerful error detection capability and an outer four-state convolutional code that makes it suitable for a list Viterbi decoder. The system’s code is very long, consisting of 15, 120 bits, which renders a high complexity decoder impractical, while the high error detection allows for a list decoder with very low undetected error probability. We use a very efficient list Viterbi decoder algorithm to avoid most of the redundant operations to produce low complexity serial list Viterbi decoder. The combined system reduces the error floor, moderately for the original version of the system, and completely suppresses it when the code length is increased to four times longer. 
    more » « less
  5. We present a topology for suppressing quantization lobes in 1-bit reconfigurable reflective surfaces (RRSs). RRSs are planar surfaces that redirect the imping waves to the desired direction through phase modulation. For single-bit modulation, plane-wave illuminated RRSs exhibit quantization lobes due to the limited number of available phase bits. To eliminate such lobes, we randomize the quantization error by employing a fixed but random phase delay in every unit-cell of the RRS. Specifically, we focus on the fabrication and characterization of a mmWave single-layer, 1-bit, 30×30 randomized RRS designed at 222.5 GHz. The quasi-optical RCS characterization of the fabricated RRS demonstrates the successful suppression of the quantization lobe using the proposed technique. 
    more » « less