skip to main content


This content will become publicly available on May 1, 2024

Title: Runway: In-transit Data Compression on Heterogeneous HPC Systems
To alleviate bottlenecks in storing and accessing data on high-performance computing (HPC) systems, I/O libraries are enabling computation while data is in-transit, such as HDFS filters. For scientific applications that commonly use floating-point data, error-bounded lossy compression methods are a critical technique to significantly reduce the storage and bandwidth requirements. Thus far, deciding when and where to schedule in-transit data transformations, such as compression, has been outside the scope of I/O libraries. In this paper, we introduce Runway, a runtime framework that enables computation on in-transit data with an object storage abstraction. Runway is designed to be extensible to execute user-defined functions at runtime. In this effort, we focus on studying methods to offload data compression operations to available processing units based on latency and throughput. We compare the performance of running compression on multi-core CPUs, as well as offloading it to a GPU and a Data Processing Unit (DPU). We implement a state-of-the-art error-bounded lossy compression algorithm, SZ3, as a Runway function with a variant optimized for DPUs. We propose dynamic modeling to guide scheduling decisions for in-transit data compression. We evaluate Runway using four scientific datasets from the SDRBench benchmark suite on a the Perlmutter supercomputer at NERSC.  more » « less
Award ID(s):
1812727
NSF-PAR ID:
10437071
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)
Page Range / eLocation ID:
229 to 239
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Vast volumes of data are produced by today’s scientific simulations and advanced instruments. These data cannot be stored and transferred efficiently because of limited I/O bandwidth, network speed, and storage capacity. Error-bounded lossy compression can be an effective method for addressing these issues: not only can it significantly reduce data size, but it can also control the data distortion based on user-defined error bounds. In practice, many scientific applications have specific requirements or constraints for lossy compression, in order to guarantee that the reconstructed data are valid for post hoc analysis. For example, some datasets contain irrelevant data that should be isolated in particular and users often have intuition regarding value ranges, geospatial regions, and other data subsets that are crucial for subsequent analysis. Existing state-of-the-art error-bounded lossy compressors, however, do not consider these constraints during compression, resulting in inferior compression ratios with respect to user’s post hoc analysis, due to the fact that the data itself provides little or no value for post hoc analysis. In this work we address this issue by proposing an optimized framework that can preserve diverse constraints during the error-bounded lossy compression, e.g., cleaning the irrelevant data, efficiently preserving different precision for multiple value intervals, and allowing users to set diverse precision over both regular and irregular regions. We perform our evaluation on a supercomputer with up to 2,100 cores. Experiments with six real-world applications show that our proposed diverse constraints based error-bounded lossy compressor can obtain a higher visual quality or data fidelity on reconstructed data with the same or even higher compression ratios compared with the traditional state-of-the-art compressor SZ. Our experiments also demonstrate very good scalability in compression performance compared with the I/O throughput of the parallel file system. 
    more » « less
  2. High-performance computing (HPC) systems that run scientific simulations of significance produce a large amount of data during runtime. Transferring or storing such big datasets causes a severe I/O bottleneck and a considerable storage burden. Applying compression techniques, particularly lossy compressors, can reduce the size of the data and mitigate such overheads. Unlike lossless compression algorithms, error-controlled lossy compressors could significantly reduce the data size while respecting the user-defined error bound. DCTZ is one of the transform-based lossy compressors with a highly efficient encoding and purpose-built error control mechanism that accomplishes high compression ratios with high data fidelity. However, since DCTZ quantizes the DCT coefficients in the frequency domain, it may only partially control the relative error bound defined by the user. In this paper, we aim to improve the compression quality of DCTZ. Specifically, we propose a preconditioning method based on level offsetting and scaling to control the magnitude of input of the DCTZ framework, thereby enforcing stricter error bounds. We evaluate the performance of our method in terms of compression ratio and rate distortion with real-world HPC datasets. Our experimental result shows that our method can achieve a higher compression ratio than other state-of-the-art lossy compressors with a tighter error bound while precisely guaranteeing the user-defined error bound. 
    more » « less
  3. As parallel computers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design for an efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ), which is the first lossy compressor that can achieve high compression ratios and throughputs simultaneously. Specifically, we propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline, from a variety of representative scientific datasets. Moreover, we derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. In addition, we develop an efficient compression pipeline by adopting cuSZ's dual-quantization algorithm to our hardware use cases. Finally, we evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes (to accelerate parallel I/O). Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2X of throughput and 9.6X of ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 28.1X and 36.9X, respectively. 
    more » « less
  4. Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose to leverage high-dimensional (e.g., 3D) compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose three pre-process strategies and adaptively use them based on the data characteristics. Experiments on seven AMR datasets from a real-world large-scale AMR simulation demonstrate that our proposed approach can improve the compression ratio by up to 3.3X under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics. 
    more » « less
  5. Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC applications are becoming heterogeneous using accelerator-based architectures, in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existing state-of-the-art GPU-based lossy compressors suffer from either low compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version, cuSZ, for one of the best error-bounded lossy compressors-SZ. To the best of our knowledge, cuSZ is the first error-bounded lossy compressor on GPUs for scientific data. Our contributions are fourfold. (1) We propose a dual-quantization scheme to entirely remove the data dependency in the prediction step of SZ such that this step can be performed very efficiently on GPUs. (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4) We evaluate our cuSZ on five real-world HPC application datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughput by up to 370.1x and 13.1x, respectively, over the production version running on single and multiple CPU cores, respectively, while getting the same quality of reconstructed data. It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor. 
    more » « less