On-FPGA training with ultra memory reduction: A low-precision tensor method

Zhang, Kaiqi; Hawkins, Cole; Zhang, Xiyuan; Hao, Cong; Zhang, Zheng

Citation Details

Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training. more »

Award ID(s):: 1817037

PAR ID:: 10310713

Author(s) / Creator(s):: Zhang, Kaiqi; Hawkins, Cole; Zhang, Xiyuan; Hao, Cong; Zhang, Zheng

Date Published:: 2021-05-01

Journal Name:: ICLR Workshop on Hardware Aware Efficient Training

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this