Error Resilient Transformer Networks: A Novel Sensitivity Guided Approach to Error Checking and Suppression

Ma, Kwondo.; Amarnath, Chandramouli.; Chatterjee, Abhijit.

Citation Details

Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads. more »

Award ID(s):: 2128419

PAR ID:: 10453094

Author(s) / Creator(s):: Ma, Kwondo.; Amarnath, Chandramouli.; Chatterjee, Abhijit.

Date Published:: 2023-05-22

Journal Name:: European Test Symposium

Page Range / eLocation ID:: 1-6

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this