Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart

Zhang, Jialing; Zhuo, Xiaoyan; Moon, Aekyeung; Liu, Hang; Son, Seung Woo

doi:10.1109/MSST.2019.00-14

Citation Details

Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart

As the amount of data produced by HPC applications reaches the exabyte range, compression techniques are often adopted to reduce the checkpoint time and volume. Since lossless techniques are limited in their ability to achieve appreciable data reduction, lossy compression becomes a preferable option. In this work, a lossy compression technique with highly efficient encoding, purpose-built error control, and high compression ratios is proposed. Specifically, we apply a discrete cosine transform with a novel block decomposition strategy directly to double-precision floating point datasets instead of prevailing prediction-based techniques. Further, we design an adaptive quantization with two specific task-oriented quantizers: guaranteed error bounds and higher compression ratios. Using real-world HPC datasets, our approach achieves 3x-38x compression ratios while guaranteeing specified error bounds, showing comparable performance with state-of-the-art lossy compression methods, SZ and ZFP. Moreover, our method provides viable reconstructed data for various checkpoint/restart scenarios in the FLASH application, thus is considered to be a promising approach for lossy data compression in HPC I/O software stacks. more »

Award ID(s):: 1751143

NSF-PAR ID:: 10091960

Author(s) / Creator(s):: Zhang, Jialing; Zhuo, Xiaoyan; Moon, Aekyeung; Liu, Hang; Son, Seung Woo

Date Published:: 2019-05-01

Journal Name:: 35th Symposium on Mass Storage Systems and Technologies (MSST)

Page Range / eLocation ID:: 79 to 91

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/MSST.2019.00-14

More Like this