Significantly Improving Fixed-Ratio Compression Framework for Resource-limited Applications

Nguyen, Tri; Rahman, Md Hasanur; Di, Sheng; Becchi, Michela

doi:10.1145/3673038.3673092

Scientific simulations running on HPC facilities generate massive amount of data, putting significant pressure onto supercomputers’ storage capacity and network bandwidth. To alleviate this problem, there has been a rich body of work on reducing data volumes via error-controlled lossy compression. However, fixed-ratio compression is not very well-supported, not allowing users to appropriately allocate memory/storage space or know the data transfer time over the network in advance. To address this problem, recent ratio-controlled frameworks, such as FXRZ, have incorporated methods to predict required error bound settings to reach a user-specified compression ratio. However, these approaches fail to achieve fixed-ratio compression in an accurate, efficient and scalable fashion on diverse datasets and compression algorithms. This work proposes an efficient, scalable, ratio-controlled lossy compression framework (CAROL). At the core of CAROL are four optimization strategies that allow for improving the prediction accuracy and runtime efficiency over state-of-the-art solutions. First, CAROL uses surrogate-based compression ratio estimation to generate training data. Second, it includes a novel calibration method to improve prediction accuracy across a variety of compressors. Third, it leverages Bayesian optimization to allow for efficient training and incremental model refinement. Forth, it uses GPU acceleration to speed up prediction. We evaluate CAROL on four compression algorithms and six scientific datasets. On average, when compared to the state-of-the-art FXRZ framework, CAROL achieves 4 × speedup in setup time and 36 × speedup in inference time, while maintaining less than 1% difference in estimation accuracy.

More Like this