Monitoring the Health of Emerging Neural Network Accelerators with Cost-effective Concurrent Test

Liu, Qi; Liu, Tao; Liu, Zihao; Wen, Wujie; Yang, Chengmo.

Citation Details

ReRAM-based neural network accelerator is a promising solution to handle the memory- and computation-intensive deep learning workloads. However, it suffers from unique device errors. These errors can accumulate to massive levels during the run time and cause significant accuracy drop. It is crucial to obtain its fault status in real-time before any proper repair mechanism can be applied. However, calibrating such statistical information is non-trivial because of the need of a large number of test patterns, long test time, and high test coverage considering that complex errors may appear in million-to-billion weight parameters. In this paper, we leverage the concept of corner data that can significantly confuse the decision making of neural network model, as well as the training algorithm, to generate only a small set of test patterns that is tuned to be sensitive to different levels of error accumulation and accuracy loss. Experimental results show that our method can quickly and correctly report the fault status of a running accelerator, outperforming existing solutions in both detection efficiency and cost. more »

Award ID(s):: 2011236 2006748

PAR ID:: 10188563

Author(s) / Creator(s):: Liu, Qi; Liu, Tao; Liu, Zihao; Wen, Wujie; Yang, Chengmo.

Date Published:: 2020-07-19

Journal Name:: IEEE/ACM Design Automation Conference (DAC)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this