skip to main content


Title: Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters
Award ID(s):
1818253 1854828 1931537 2007991 2018627 2112606
NSF-PAR ID:
10355069
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ISC HIGH PERFORMANCE
Page Range / eLocation ID:
3-25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the context of parallel applications, communication is a critical part of the infrastructure and a potential bottleneck. The traditional approach to tackle communication challenges consists of redesigning algorithms so that the complexity or the communication volume is reduced. However, there are algorithms like the Fast Fourier Transform (FFT) where reducing the volume of communication is very challenging yet can reap large benefit in terms of time-to-completion. In this paper, we revisit the implementation of the MPI all-to-all routine at the core of 3D FFTs by using advanced MPI features, such as One-Sided Communication, and integrate data compression during communication to reduce the volume of data exchanged. Since some compression techniques are ‘lossy’ in the sense that they involve a loss of accuracy, we study the impact of lossy compression in heFFTe, the state-of-the-art FFT library for large scale 3D FFTs on hybrid architectures with GPUs. Consequently, we design an approximate FFT algorithm that trades off user-controlled accuracy for speed. We show that we speedup the 3D FFTs proportionally to the compression rate. In terms of accuracy, comparing our approach with a reduced precision execution, where both the data and the computation are in reduced precision, we show that when the volume of communication is compressed to the size of the reduced precision data, the approximate FFT algorithm is as fast as the one in reduced precision while the accuracy is one order of magnitude better. 
    more » « less