skip to main content


Title: Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs
In the context of parallel applications, communication is a critical part of the infrastructure and a potential bottleneck. The traditional approach to tackle communication challenges consists of redesigning algorithms so that the complexity or the communication volume is reduced. However, there are algorithms like the Fast Fourier Transform (FFT) where reducing the volume of communication is very challenging yet can reap large benefit in terms of time-to-completion. In this paper, we revisit the implementation of the MPI all-to-all routine at the core of 3D FFTs by using advanced MPI features, such as One-Sided Communication, and integrate data compression during communication to reduce the volume of data exchanged. Since some compression techniques are ‘lossy’ in the sense that they involve a loss of accuracy, we study the impact of lossy compression in heFFTe, the state-of-the-art FFT library for large scale 3D FFTs on hybrid architectures with GPUs. Consequently, we design an approximate FFT algorithm that trades off user-controlled accuracy for speed. We show that we speedup the 3D FFTs proportionally to the compression rate. In terms of accuracy, comparing our approach with a reduced precision execution, where both the data and the computation are in reduced precision, we show that when the volume of communication is compressed to the size of the reduced precision data, the approximate FFT algorithm is as fast as the one in reduced precision while the accuracy is one order of magnitude better.  more » « less
Award ID(s):
1664142
NSF-PAR ID:
10393193
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2022 IEEE International Conference on Cluster Computing (CLUSTER)
Page Range / eLocation ID:
152 to 160
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent years have witnessed an upsurge of interest in lossy compression due to its potential to significantly reduce data volume with adequate exploitation of the spatiotemporal properties of IoT datasets. However, striking a balance between compression ratios and data fidelity is challenging, particularly when losing data fidelity impacts downstream data analytics noticeably. In this paper, we propose a lossy prediction model dealing with binary classification analytics tasks to minimize the impact of the error introduced due to lossy compression. We specifically focus on five classification algorithms for frost prediction in agricultural fields allowing preparation by the predictive advisories to provide helpful information for timely services. While our experimental evaluations reaffirm the nature of lossy compressions where allowing higher errors offers higher compression ratios, we also observe that the classification performance in terms of accuracy and F-1 score differs among all the algorithms we evaluated. Specifically, random forest is the best lossy prediction model for classifying frost. Lastly, we show the robustness of the lossy prediction model based on the data fidelity in prediction performance. 
    more » « less
  2. As the scale and complexity of high-performance computing (HPC) systems keep growing, data compression techniques are often adopted to reduce the data volume and processing time. While lossy compression becomes preferable to a lossless one because of the potential benefit of generating a high compression ratio, it would lose its worth the effort without finding an optimal balance between volume reduction and information loss. Among many lossy compression techniques, transform-based lossy algorithms utilize spatial redundancy better. However, the transform-based lossy compressor has received relatively less attention because there is a lack of understanding of its compression performance on scientific data sets. The insight of this paper is that, in transform-based lossy compressors, quantifying dominant coefficients at the block level reveals the right balance, potentially impacting overall compression ratios. Motivated by this, we characterize three transformation-based lossy compression mechanisms with different information compaction methods using the statistical features that capture data characteristics. And then, we build several prediction models using the statistical features and the characteristics of dominant coefficients and evaluate the effectiveness of each model using six HPC datasets from three production-level simulations at scale. Our results demonstrate that the random forest classifier captures the behavior of dominant coefficients precisely, achieving nearly 99% of prediction accuracy. 
    more » « less
  3. Error-bounded lossy compression is one of the most effective techniques for reducing scientific data sizes. However, the traditional trial-and-error approach used to configure lossy compressors for finding the optimal trade-off between reconstructed data quality and compression ratio is prohibitively expensive. To resolve this issue, we develop a general-purpose analytical ratio-quality model based on the prediction-based lossy compression framework, which can effectively foresee the reduced data quality and compression ratio, as well as the impact of lossy compressed data on post-hoc analysis quality. Our analytical model significantly improves the prediction-based lossy compression in three use-cases: (1) optimization of predictor by selecting the best-fit predictor; (2) memory compression with a target ratio; and (3) in-situ compression optimization by fine-grained tuning error-bounds for various data partitions. We evaluate our analytical model on 10 scientific datasets, demonstrating its high accuracy (93.47% accuracy on average) and low computational cost (up to 18.7× lower than the trial-and-error approach) for estimating the compression ratio and the impact of lossy compression on post-hoc analysis quality. We also verify the high efficiency of our ratio-quality model using different applications across the three use-cases. In addition, our experiment demonstrates that our modeling-based approach reduces the time to store the 3D RTM data with HDF5 by up to 3.4× with 128 CPU cores over the traditional solution. 
    more » « less
  4. Fully Homomorphic Encryption over the Torus (TFHE) allows arbitrary computations to happen directly on ciphertexts using homomorphic logic gates. However, each TFHE gate on state-of-the-art hardware platforms such as GPUs and FPGAs is extremely slow (> 0.2ms). Moreover, even the latest FPGA-based TFHE accelerator cannot achieve high energy efficiency, since it frequently invokes expensive double-precision floating point FFT and IFFT kernels. In this paper, we propose a fast and energy-efficient accelerator, MATCHA, to process TFHE gates. MATCHA supports aggressive bootstrapping key unrolling to accelerate TFHE gates without decryption errors by approximate multiplication-less integer FFTs and IFFTs, and a pipelined datapath. Compared to prior accelerators, MATCHA improves the TFHE gate processing throughput by 2.3x, and the throughput per Watt by 6.3x. 
    more » « less
  5. The objective of this work is to develop error-bounded lossy compression methods to preserve topological features in 2D and 3D vector fields. Specifically, we explore the preservation of critical points in piecewise linear and bilinear vector fields. We define the preservation of critical points as, without any false positive, false negative, or false type in the decompressed data, (1) keeping each critical point in its original cell and (2) retaining the type of each critical point (e.g., saddle and attracting node). The key to our method is to adapt a vertex-wise error bound for each grid point and to compress input data together with the error bound field using a modified lossy compressor. Our compression algorithm can be also embarrassingly parallelized for large data handling and in situ processing. We benchmark our method by comparing it with existing lossy compressors in terms of false positive/negative/type rates, compression ratio, and various vector field visualizations with several scientific applications. 
    more » « less