NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Concealing Compression-accelerated I/O for HPC Applications through In Situ Task Scheduling

https://doi.org/10.1145/3627703.3629573

Jin, Sian; Di, Sheng; Vivien, Frédéric; Wang, Daoce; Robert, Yves; Tao, Dingwen; Cappello, Franck (April 2024, ACM)

Full Text Available
FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

https://doi.org/10.14778/3648160.3648180

Chen, Xinyu; Tian, Jiannan; Beaver, Ian; Freeman, Cynthia; Yan, Yan; Wang, Jianguo; Tao, Dingwen (February 2024, Proceedings of the VLDB Endowment)

While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications.
more » « less
Full Text Available
BIRD+: Design of a Lightweight Communication Compressor for Resource-Constrained Distribution Learning Platforms

https://doi.org/10.1109/TPDS.2024.3447221

Wu, Donglei; Yang, Weihao; Zou, Xiangyu; Feng, Hao; Tao, Dingwen; Li, Shiyi; Xia, Wen; Fang, Binxing (January 2024, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs

Zhang, Boyuan; Tian, Jiannan; Di, Sheng; Yu, Xiaodong; Swany, Martin; Tao, Dingwen; Cappello, Franck (June 2023, The 37th ACM International Conference on Supercomputing (ICS 2023))

Full Text Available
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

https://doi.org/10.1145/3588195.3592994

Zhang, Boyuan; Tian, Jiannan; Di, Sheng; Yu, Xiaodong; Feng, Yunhe; Liang, Xin; Tao, Dingwen; Cappello, Franck (June 2023, The 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2023))

Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.
more » « less
Full Text Available
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

Jin, Sian; Tao, Dingwen; Tang, Houjun; Di, Sheng; Byna, Suren; Lukic, Zarija; Cappello, Franck (November 2022, SC22: International Conference for High Performance Computing, Networking, Storage and Analysis)

Lossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel write due to the lack of deep understanding on compression-write performance. To this end, we propose to deeply integrate predictive lossy compression with HDF5 to significantly improve the parallel-write performance. Specifically, we propose analytical models to predict the time of compression and parallel write before the actual compression to enable compression-write overlapping. We also introduce an extra space in the process to handle possible data overflows resulting from prediction uncertainty in compression ratios. Moreover, we propose an optimization to reorder the compression tasks to increase the overlapping efficiency. Experiments with up to 4,096 cores from Summit show that our solution improves the write performance by up to 4.5× and 2.9× over the non-compression and lossy compression solutions, respectively, with only 1.5% storage overhead (compared to original data) on two real-world HPC applications.
more » « less
Full Text Available
Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints

https://doi.org/10.1109/TPDS.2022.3194695

Liu, Yuanjian; Di, Sheng; Zhao, Kai; Jin, Sian; Wang, Cheng; Chard, Kyle; Tao, Dingwen; Foster, Ian; Cappello, Franck (July 2022, IEEE Transactions on Parallel and Distributed Systems)

Vast volumes of data are produced by today’s scientific simulations and advanced instruments. These data cannot be stored and transferred efficiently because of limited I/O bandwidth, network speed, and storage capacity. Error-bounded lossy compression can be an effective method for addressing these issues: not only can it significantly reduce data size, but it can also control the data distortion based on user-defined error bounds. In practice, many scientific applications have specific requirements or constraints for lossy compression, in order to guarantee that the reconstructed data are valid for post hoc analysis. For example, some datasets contain irrelevant data that should be isolated in particular and users often have intuition regarding value ranges, geospatial regions, and other data subsets that are crucial for subsequent analysis. Existing state-of-the-art error-bounded lossy compressors, however, do not consider these constraints during compression, resulting in inferior compression ratios with respect to user’s post hoc analysis, due to the fact that the data itself provides little or no value for post hoc analysis. In this work we address this issue by proposing an optimized framework that can preserve diverse constraints during the error-bounded lossy compression, e.g., cleaning the irrelevant data, efficiently preserving different precision for multiple value intervals, and allowing users to set diverse precision over both regular and irregular regions. We perform our evaluation on a supercomputer with up to 2,100 cores. Experiments with six real-world applications show that our proposed diverse constraints based error-bounded lossy compressor can obtain a higher visual quality or data fidelity on reconstructed data with the same or even higher compression ratios compared with the traditional state-of-the-art compressor SZ. Our experiments also demonstrate very good scalability in compression performance compared with the I/O throughput of the parallel file system.
more » « less
Full Text Available
TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations

https://doi.org/10.1145/3502181.3531458

Wang, Daoce; Pulido, Jesus; Grosset, Pascal; Jin, Sian; Tian, Jiannan; Ahrens, James; Tao, Dingwen (June 2022, The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2022))

Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. However, little work has been done to improve error-bounded lossy compression for Adaptive Mesh Refinement (AMR) simulation data. Unlike the previous work that only leverages 1D compression, in this work, we propose to leverage high-dimensional (e.g., 3D) compression for each refinement level of AMR data. To remove the data redundancy across different levels, we propose three pre-process strategies and adaptively use them based on the data characteristics. Experiments on seven AMR datasets from a real-world large-scale AMR simulation demonstrate that our proposed approach can improve the compression ratio by up to 3.3X under the same data distortion, compared to the state-of-the-art method. In addition, we leverage the flexibility of our approach to tune the error bound for each level, which achieves much lower data distortion on two application-specific metrics.
more » « less
Full Text Available
Ultrafast Error-Bounded Lossy Compression for Scientific Datasets

https://doi.org/10.1145/3502181.3531473

Yu, Xiaodong; Di, Sheng; Zhao, Kai; Tian, Jiannan; Tao, Dingwen; Liang, Xin; Cappello, Franck (June 2022, The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2022))

Today’s scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework—called SZx—that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2∼16× faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression.
more » « less
Full Text Available
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression

https://doi.org/10.1145/3524059.3532362

Zhang, Chengming; Jin, Sian; Geng, Tong; Tian, Jiannan; Li, Ang; Tao, Dingwen (June 2022, The 36th ACM International Conference on Supercomputing (ICS 2022))

As parallel computers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding. To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance. However, little work has been done for effectively offloading lossy compression onto FPGA-based SmartNICs to reduce the compression overhead. In this paper, we propose a hardware-algorithm co-design for an efficient and adaptive lossy compressor for scientific data on FPGAs (called CEAZ), which is the first lossy compressor that can achieve high compression ratios and throughputs simultaneously. Specifically, we propose an efficient Huffman coding approach that can adaptively update Huffman codewords online based on codewords generated offline, from a variety of representative scientific datasets. Moreover, we derive a theoretical analysis to support a precise control of compression ratio under an error-bounded compression mode, enabling accurate offline Huffman codewords generation. This also helps us create a fixed-ratio compression mode for consistent throughput. In addition, we develop an efficient compression pipeline by adopting cuSZ's dual-quantization algorithm to our hardware use cases. Finally, we evaluate CEAZ on five real-world datasets with both a single FPGA board and 128 nodes (to accelerate parallel I/O). Experiments show that CEAZ outperforms the second-best FPGA-based lossy compressor by 2X of throughput and 9.6X of ratio. It also improves MPI_File_write and MPI_Gather throughputs by up to 28.1X and 36.9X, respectively.
more » « less
Full Text Available

« Prev Next »

Search for: All records