On Efficient Constructions of Checkpoints

Yu Chen, Zhenming Liu

Citation Details

Efficient construction of checkpoints/snapshots is a critical tool for training and diagnosing deep learning models. In this paper, we propose a lossy compression scheme for checkpoint constructions (called LC-Checkpoint). LC-Checkpoint simultaneously maximizes the compression rate and optimizes the recovery speed, under the assumption that SGD is used to train the model. LC-Checkpointuses quantization and priority promotion to store the most crucial information for SGD to recover, and then uses a Huffman coding to leverage the non-uniform distribution of the gradient scales. Our extensive experiments show that LC-Checkpoint achieves a compression rate up to 28× and recovery speedup up to 5.77× over a state-of-the-art algorithm (SCAR). more »

Award ID(s):: 1835821

PAR ID:: 10212765

Author(s) / Creator(s):: Yu Chen, Zhenming Liu

Date Published:: 2020-07-01

Journal Name:: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this