skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Error Resilient Transformer Networks: A Novel Sensitivity Guided Approach to Error Checking and Suppression
Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.  more » « less
Award ID(s):
2128419
PAR ID:
10453094
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
European Test Symposium
Page Range / eLocation ID:
1-6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The reliability of emerging neuromorphic compute fabrics is of great concern due to their widespread use in critical data-intensive applications. Ensuring such reliability is difficult due to the intensity of underlying computations (billions of parameters), errors induced by low power operation and the complex relationship between errors in computations and their effect on network performance accuracy. We study the problem of designing error-resilient neuromorphic systems where errors can stem from: (a) soft errors in computation of matrix-vector multiplications and neuron activations, (b) malicious trojan and adversarial security attacks and (c) effects of manufacturing process variations on analog crossbar arrays that can affect DNN accuracy. The core principle of error detection relies on embedded predictive neuron checks using invariants derived from the statistics of nominal neuron activation patterns of hidden layers of a neural network. Algorithmic encodings of hidden neuron function are also used to derive invariants for checking. A key contribution is designing checks that are robust to the inherent nonlinearity of neuron computations with minimal impact on error detection coverage. Once errors are detected, they are corrected using probabilistic methods due to the difficulties involved in exact error diagnosis in such complex systems. The technique is scalable across soft errors as well as a range of security attacks. The effects of manufacturing process variations are handled through the use of compact tests from which DNN performance can be assessed using learning techniques. Experimental results on a variety of neuromorphic test systems: DNNs, spiking networks and hyperdimensional computing are presented. 
    more » « less
  2. null (Ed.)
    In this paper we propose a framework for concurrent detection of soft computation errors in particle filters which are finding increasing use in robotics applications. The particle filter works by sampling the multi-variate probability distribution of the states of a system (samples called particles, each particle representing a vector of states) and projecting these into the future using appropriate nonlinear mappings. We propose the addition of a `check' state to the system as a linear combination of the system states for error detection. The check state produces an error signal corresponding to each particle, whose statistics are tracked across a sliding time window. Shifts in the error statistics across all particles are used to detect soft computation errors as well as anomalous sensor measurements. Simulation studies indicate that errors in particle filter computations can be detected with high coverage and low latency. 
    more » « less
  3. Artificial Intelligence (AI) has permeated various domains but is limited by the bottlenecks imposed by data transfer latency inherent in contemporary memory technologies. Matrix multiplication, crucial for neural network training and inference, can be significantly expedited with a complexity of O(1) using Resistive RAM (RRAM) technology, instead of the conventional complexity of O(n2). This positions RRAM as a promising candidate for the efficient hardware implementation of machine learning and neural networks through in-memory computation. However, RRAM manufacturing technology remains in its infancy, rendering it susceptible to soft errors, potentially compromising neural network accuracy and reliability. In this paper, we propose a syndrome-based error correction scheme that employs selective weighted checksums to correct double adjacent column errors in RRAM. The error correction is done on the output of the matrix multiplication thus ensuring correct operation for any number of errors in two adjacent columns. The proposed codes have low redundancy and low decoding latency, making it suitable for high throughput applications. This schemeuses a repeating weight based structure that makes it scalable to large RRAM matrix sizes. 
    more » « less
  4. Dual learning has attracted much attention in machine learning, computer vision and natural language processing communities. The core idea of dual learning is to leverage the duality between the primal task (mapping from domain X to domain Y) and dual task (mapping from domain Y to X) to boost the performances of both tasks. Existing dual learning framework forms a system with two agents (one primal model and one dual model) to utilize such duality. In this paper, we extend this framework by introducing multiple primal and dual models, and propose the multi-agent dual learning framework. Experiments on neural machine translation and image translation tasks demonstrate the effectiveness of the new framework. In particular, we set a new record on IWSLT 2014 German-to-English translation with a 35.44 BLEU score, achieve a 31.03 BLEU score on WMT 2014 English-to-German translation with over 2.6 BLEU improvement over the strong Transformer baseline, and set a new record of 49.61 BLEU score on the recent WMT 2018 English-to-German translation. 
    more » « less
  5. Abstract Quantum computing holds transformative promise, but its realization is hindered by the inherent susceptibility of quantum computers to errors. Quantum error mitigation has proved to be an enabling way to reduce computational error in present noisy intermediate scale quantum computers. This research introduces an innovative approach to quantum error mitigation by leveraging machine learning, specifically employing adaptive neural networks. With experiment and simulations done on 127-qubit IBM superconducting quantum computer, we were able to develop and train a neural network architecture to dynamically adjust output expectation values based on error characteristics. The model leverages a prior classifier module outcome on simulated quantum circuits with errors, and the antecedent neural network regression module adapts its parameters and response to each error characteristics. Results demonstrate the adaptive neural network’s efficacy in mitigating errors across diverse quantum circuits and noise models, showcasing its potential to surpass traditional error mitigation techniques with an accuracy of 99% using the fully adaptive neural network for quantum error mitigation. This work presents a significant application of classical machine learning methods towards enhancing the robustness and reliability of quantum computations, providing a pathway for the practical realization of quantum computing technologies. 
    more » « less