Embarrassingly Simple Dataset Distillation

Feng, Yunzhen; Vedantam, Shanmukha; Kempe, Julia

Citation Details

Training of large-scale models in general requires enormous amounts of traning data. Dataset distillation aims to extract a small set of synthetic training samples from a large dataset with the goal of achieving competitive performance on test data when trained on this sample, thus reducing both dataset size and training time. In this work, we tackle dataset distillation at its core by treating it directly as a bilevel optimization problem. Re-examining the foundational back-propagation through time method, we study the pronounced variance in the gradients, computational burden, and long-term dependencies. We introduce an improved method: Random Truncated Backpropagation Through Time (RaT-BPTT) to address them. RaT-BPTT incorporates a truncation coupled with a random window, effectively stabilizing the gradients and speeding up the optimization while covering long dependencies. This allows us to establish new dataset distillation state-of-the-art for a variety of standard dataset benchmarks. more »

Award ID(s):: 1922658

PAR ID:: 10534883

Author(s) / Creator(s):: Feng, Yunzhen; Vedantam, Shanmukha; Kempe, Julia

Publisher / Repository:: Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

Date Published:: 2023-12-16

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this