skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Scientific simulations generate large amounts of floating-point data, which are often not very compressible using the traditional reduction schemes, such as deduplication or lossless compression. The emergence of lossy floating-point compression holds promise to satisfy the data reduction demand from HPC applications; however, lossy compression has not been widely adopted in science production. We believe a fundamental reason is that there is a lack of understanding of the benefits, pitfalls, and performance of lossy compression on scientific data. In this paper, we conduct a comprehensive study on state-of- the-art lossy compression, including ZFP, SZ, and ISABELA, using real and representative HPC datasets. Our evaluation reveals the complex interplay between compressor design, data features and compression performance. The impact of reduced accuracy on data analytics is also examined through a case study of fusion blob detection, offering domain scientists with the insights of what to expect from fidelity loss. Furthermore, the trial and error approach to understanding compression performance involves substantial compute and storage overhead. To this end, we propose a sampling based estimation method that extrapolates the reduction ratio from data samples, to guide domain scientists to make more informed data reduction decisions.  more » « less
Award ID(s):
1717660 1702474
PAR ID:
10065104
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
the 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With ever-increasing volumes of scientific floating-point data being produced by high-performance computing applications, significantly reducing scientific floating-point data size is critical, and error-controlled lossy compressors have been developed for years. None of the existing scientific floating-point lossy data compressors, however, support effective fixed-ratio lossy compression. Yet fixed-ratio lossy compression for scientific floating-point data not only compresses to the requested ratio but also respects a user-specified error bound with higher fidelity. In this paper, we present FRaZ: a generic fixed-ratio lossy compression framework respecting user-specified error constraints. The contribution is twofold. (1) We develop an efficient iterative approach to accurately determine the appropriate error settings for different lossy compressors based on target compression ratios. (2) We perform a thorough performance and accuracy evaluation for our proposed fixed-ratio compression framework with multiple state-of-the-art error-controlled lossy compressors, using several real-world scientific floating-point datasets from different domains. Experiments show that FRaZ effectively identifies the optimum error setting in the entire error setting space of any given lossy compressor. While fixed-ratio lossy compression is slower than fixed-error compression, it provides an important new lossy compression technique for users of very large scientific floating-point datasets. 
    more » « less
  2. null (Ed.)
    With ever-increasing volumes of scientific floating-point data being produced by high-performance computing applications, significantly reducing scientific floating-point data size is critical, and error-controlled lossy compressors have been developed for years. None of the existing scientific floating-point lossy data compressors, however, support effective fixed-ratio lossy compression. Yet fixed-ratio lossy compression for scientific floating-point data not only compresses to the requested ratio but also respects a user-specified error bound with higher fidelity. In this paper, we present FRaZ: a generic fixed-ratio lossy compression framework respecting user-specified error constraints. The contribution is twofold. (1) We develop an efficient iterative approach to accurately determine the appropriate error settings for different lossy compressors based on target compression ratios. (2) We perform a thorough performance 
    more » « less
  3. While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications. 
    more » « less
  4. As the amount of data produced by HPC applications reaches the exabyte range, compression techniques are often adopted to reduce the checkpoint time and volume. Since lossless techniques are limited in their ability to achieve appreciable data reduction, lossy compression becomes a preferable option. In this work, a lossy compression technique with highly efficient encoding, purpose-built error control, and high compression ratios is proposed. Specifically, we apply a discrete cosine transform with a novel block decomposition strategy directly to double-precision floating point datasets instead of prevailing prediction-based techniques. Further, we design an adaptive quantization with two specific task-oriented quantizers: guaranteed error bounds and higher compression ratios. Using real-world HPC datasets, our approach achieves 3x-38x compression ratios while guaranteeing specified error bounds, showing comparable performance with state-of-the-art lossy compression methods, SZ and ZFP. Moreover, our method provides viable reconstructed data for various checkpoint/restart scenarios in the FLASH application, thus is considered to be a promising approach for lossy data compression in HPC I/O software stacks. 
    more » « less
  5. As the scale and complexity of high-performance computing (HPC) systems keep growing, data compression techniques are often adopted to reduce the data volume and processing time. While lossy compression becomes preferable to a lossless one because of the potential benefit of generating a high compression ratio, it would lose its worth the effort without finding an optimal balance between volume reduction and information loss. Among many lossy compression techniques, transform-based lossy algorithms utilize spatial redundancy better. However, the transform-based lossy compressor has received relatively less attention because there is a lack of understanding of its compression performance on scientific data sets. The insight of this paper is that, in transform-based lossy compressors, quantifying dominant coefficients at the block level reveals the right balance, potentially impacting overall compression ratios. Motivated by this, we characterize three transformation-based lossy compression mechanisms with different information compaction methods using the statistical features that capture data characteristics. And then, we build several prediction models using the statistical features and the characteristics of dominant coefficients and evaluate the effectiveness of each model using six HPC datasets from three production-level simulations at scale. Our results demonstrate that the random forest classifier captures the behavior of dominant coefficients precisely, achieving nearly 99% of prediction accuracy. 
    more » « less