Analyzing the Performance and Accuracy of Lossy Checkpointing on Sub-Iteration of NWChem

Reza, Tasmia; Calhoun, Jon; Keipert, Kristopher; Di, Sheng; Cappello, Franck

doi:10.1109/DRBSD-549595.2019.00009

Citation Details

Analyzing the Performance and Accuracy of Lossy Checkpointing on Sub-Iteration of NWChem

Future exascale systems are expected to be characterized by more frequent failures than current petascale systems. This places increased importance on the application to minimize the amount of time wasted due to recompution when recovering from a checkpoint. Typically HPC application checkpoint at iteration boundaries. However, for applications that have a high per-iteration cost, checkpointing inside the iteration limits the amount of re-computation. This paper analyzes the performance and accuracy of using lossy compressed check-pointing in the computational chemistry application NWChem. Our results indicate that lossy compression is an effective tool for reducing the sub-iteration checkpoint size. Moreover, compression error tolerances that yield acceptable deviation in accuracy and iteration count are quantified. more »

Award ID(s):: 1910197

PAR ID:: 10193342

Author(s) / Creator(s):: Reza, Tasmia; Calhoun, Jon; Keipert, Kristopher; Di, Sheng; Cappello, Franck

Date Published:: 2019-11-17

Journal Name:: 10.1109/DRBSD-549595.2019.00009

Page Range / eLocation ID:: 23 to 27

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/DRBSD-549595.2019.00009

More Like this