skip to main content


Title: PolarAir: A Compressed Sensing Scheme for Over-the-Air Federated Learning
We explore a scheme that enables the training of a deep neural network in a Federated Learning configuration over an additive white Gaussian noise channel. The goal is to create a low complexity, linear compression strategy, called PolarAir, that reduces the size of the gradient at the user side to lower the number of channel uses needed to transmit it. The suggested approach belongs to the family of compressed sensing techniques, yet it constructs the sensing matrix and the recovery procedure using multiple access techniques. Simulations show that it can reduce the number of channel uses by ∼30% when compared to conveying the gradient without compression. The main advantage of the proposed scheme over other schemes in the literature is its low time complexity. We also investigate the behavior of gradient updates and the performance of PolarAir throughout the training process to obtain insight on how best to construct this compression scheme based on compressed sensing.  more » « less
Award ID(s):
2131106 2148354
NSF-PAR ID:
10443177
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Information Theory Workshop (ITW)
Page Range / eLocation ID:
136 to 141
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity. 
    more » « less
  2. Wireless links using massive MIMO transceivers are vital for next generation wireless communications networks. Precoding in Massive MIMO transmission requires accurate downlink channel state information (CSI). Many recent works have effectively applied deep learning (DL) to jointly train UE-side compression networks for delay domain CSI and a BS-side decoding scheme. Vitally, these works assume that the full delay domain CSI is available at the UE, but in reality, the UE must estimate the delay domain based on a limited number of frequency domain pilots. In this work, we propose a linear pilot-to-delay estimator (P2DE) that acquires the truncated delay CSI via sparse frequency pilots. We show the accuracy of the P2DE under frequency downsampling, and we demonstrate the P2DE’s efficacy when utilized with existing CSI estimation networks. Additionally, we propose to use trainable compressed sensing (CS) networks in a differential encoding network for time-varying CSI estimation, and we propose a new network, MarkovNet-ISTA-ENet (MN-IE), which combines a CS network for initial CSI estimation and multiple autoencoders to estimate the error terms. We demonstrate that MN-IE has better asymptotic performance than networks comprised of only one type of network. 
    more » « less
  3. null (Ed.)
    Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, or low-rank updates. The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup. In this work, we show that such performance degradation due to choosing a high compression ratio is not fundamental. An adaptive compression strategy can reduce communication while maintaining final test accuracy. Inspired by recent findings on critical learning regimes, in which small gradient errors can have irrecoverable impact on model performance, we propose Accordion a simple yet effective adaptive compression algorithm. While Accordion maintains a high enough compression rate on average, it avoids over-compressing gradients whenever in critical learning regimes, detected by a simple gradient-norm based criterion. Our extensive experimental study over a number of machine learning tasks in distributed environments indicates that Accordion, maintains similar model accuracy to uncompressed training, yet achieves up to 5.5x better compression and up to 4.1x end-to-end speedup over static approaches. We show that Accordion also works for adjusting the batch size, another popular strategy for alleviating communication bottlenecks. 
    more » « less
  4. Smola, A. ; Dimakis, A. ; Stoica, I. (Ed.)
    Distributed model training suffers from communication bottlenecks due to frequent model updates transmitted across compute nodes. To alleviate these bottlenecks, practitioners use gradient compression techniques like sparsification, quantization, low rank updates etc. The techniques usually require choosing a static compression ratio, often requiring users to balance the trade-off between model accuracy and per-iteration speedup. In this work, we show that such performance degradation due to choosing a high compression ratio is not fundamental and that an adaptive compression strategy can reduce communication while maintaining final test accuracy.Inspired by recent findings on critical learning regimes, in which small gradient errors can have irrecoverable impact on model performance, we propose ACCORDION a simple yet effective adaptive compression algorithm. While ACCORDION maintains a high enough compression rate on average, it avoids detrimental impact by not compressing gradients too much whenever in critical learning regimes, detected by a simple gradient-norm based criterion. Our extensive experimental study over a number of machine learning tasks in distributed environments indicates that ACCORDION, maintains similar model accuracy to uncompressed training, yet achieves up to 5.5×better compression and up to 4.1×end-to-end speedup over static approaches. We show that ACCORDION also works for adjusting the batch size, another popular strategy for alleviating communication bottlenecks. Our code is available at https://github.com/uw-mad-dash/Accordion 
    more » « less
  5. In this paper, we propose and analyze SPARQSGD, an event-triggered and compressed algorithm for decentralized training of large-scale machine learning models over a graph. Each node can locally compute a condition (event) which triggers a communication where quantized and sparsified local model parameters are sent. In SPARQ-SGD, each node first takes a fixed number of local gradient steps and then checks if the model parameters have significantly changed compared to its last update; it communicates further compressed model parameters only when there is a significant change, as specified by a (design) criterion. We prove that SPARQ-SGD converges as O(1/nT ) and O(1/√nT ) in the strongly-convex and non-convex settings, respectively, demonstrating that aggressive compression, including event-triggered communication, model sparsification and quantization does not affect the overall convergence rate compared to uncompressed decentralized training; thereby theoretically yielding communication efficiency for `free'. We evaluate SPARQ-SGD over real datasets to demonstrate significant savings in communication bits over the state-of-the-art. 
    more » « less