SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training

Wu, Kun; Park, Jeongmin Brian; Zhang, Xiaofan; Hidayetoğlu, Mert; Mailthody, Vikram Sharma; Huang, Sitao; Lumetta, Steve; Hwu, Wen-Mei

doi:10.1109/DAC63849.2025.11132754

Citation Details

This content will become publicly available on June 22, 2026

SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training

The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations—the intermediate tensors produced during forward propagation and reused in backward propagation—dominate the GPU memory use. This leads to high training overheads such as expensive weight update costs due to the small micro-batch size. To address this challenge, we propose SSDTrain, an adaptive activation offloading framework to high-capacity NVMe SSDs. SSDTrain reduces GPU memory usage without impacting performance by fully overlapping data transfers with computation. SSDTrain is compatible with popular deep learning frameworks like PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor deduplication and forwarding to further enhance efficiency. We extensively experimented with popular LLMs like GPT, BERT, and T5. Results demonstrate that SSDTrain reduces 47% of the activation peak memory usage. At the same time, SSDTrain perfectly overlaps the I/O with the computation and incurs negligible overhead. Compared with keeping activations in GPU memory and layerwise full recomputation, SSDTrain achieves the best memory savings with negligible throughput loss. We further analyze how the reduced activation memory use may be leveraged to increase throughput by increasing micro-batch size and reducing pipeline parallelism bubbles. more »

Award ID(s):: 2443992

PAR ID:: 10656905

Author(s) / Creator(s):: Wu, Kun ; Park, Jeongmin Brian ; Zhang, Xiaofan ; Hidayetoğlu, Mert ; Mailthody, Vikram Sharma ; Huang, Sitao ; Lumetta, Steve ; Hwu, Wen-Mei

Publisher / Repository:: IEEE

Date Published:: 2025-06-22

Page Range / eLocation ID:: 1 to 7

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 22, 2026
Conference Paper:
https://doi.org/10.1109/DAC63849.2025.11132754

More Like this