skip to main content

Search for: All records

Creators/Authors contains: "Amarnath, Chandramouli."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Spiking Neural Networks (SNNs) can be implemented with power-efficient digital as well as analog circuitry. However, in Resistive RAM (RRAM) based SNN accelerators, synapse weights programmed into the crossbar can differ from their ideal values due to defects and programming errors, degrading inference accuracy. In addition, circuit nonidealities within analog spiking neurons that alter the neuron spiking rate (modeled by variations in neuron firing threshold) can degrade SNN inference accuracy when the value of inference time steps (ITSteps) of SNN is set to a critical minimum that maximizes network throughput. We first develop a recursive linearized check to detect synapse weight errors with high sensitivity. This triggers a correction methodology which sets out-of-range synapse values to zero. For correcting the effects of firing threshold variations, we develop a test methodology that calibrates the extent of such variations. This is then used to proportionally increase inference time steps during inference for chips with higher variation. Experiments on a variety of SNNs prove the viability of the proposed resilience methods. 
    more » « less
    Free, publicly-accessible full text available May 29, 2024
  2. Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads. 
    more » « less
    Free, publicly-accessible full text available May 22, 2024
  3. The reliability of emerging neuromorphic compute fabrics is of great concern due to their widespread use in critical data-intensive applications. Ensuring such reliability is difficult due to the intensity of underlying computations (billions of parameters), errors induced by low power operation and the complex relationship between errors in computations and their effect on network performance accuracy. We study the problem of designing error-resilient neuromorphic systems where errors can stem from: (a) soft errors in computation of matrix-vector multiplications and neuron activations, (b) malicious trojan and adversarial security attacks and (c) effects of manufacturing process variations on analog crossbar arrays that can affect DNN accuracy. The core principle of error detection relies on embedded predictive neuron checks using invariants derived from the statistics of nominal neuron activation patterns of hidden layers of a neural network. Algorithmic encodings of hidden neuron function are also used to derive invariants for checking. A key contribution is designing checks that are robust to the inherent nonlinearity of neuron computations with minimal impact on error detection coverage. Once errors are detected, they are corrected using probabilistic methods due to the difficulties involved in exact error diagnosis in such complex systems. The technique is scalable across soft errors as well as a range of security attacks. The effects of manufacturing process variations are handled through the use of compact tests from which DNN performance can be assessed using learning techniques. Experimental results on a variety of neuromorphic test systems: DNNs, spiking networks and hyperdimensional computing are presented. 
    more » « less
    Free, publicly-accessible full text available March 23, 2024
  4. Deep learning techniques have been widely adopted in daily life with applications ranging from face recognition to recommender systems. The substantial overhead of conventional error tolerance techniques precludes their widespread use, while approaches involving median filtering and invariant generation rely on alterations to DNN training that may be difficult to achieve for larger networks on larger datasets. To address this issue, this paper presents a novel approach taking advantage of the statistics of neuron output gradients to identify and suppress erroneous neuron values. By using the statistics of neurons’ gradients with respect to their neighbors, tighter statistical thresholds are obtained compared to the use of neuron output values alone. This approach is modular and is combined with accurate, low-overhead error detection methods to ensure it is used only when needed, further reducing its cost. Deep learning models can be trained using standard methods and our error correction module is fit to a trained DNN, achieving comparable or superior performance compared to baseline error correction methods while incurring comparable hardware overhead without needing to modify DNN training or utilize specialized hardware architectures. 
    more » « less
  5. null (Ed.)
    The advent of pervasive autonomous systems such as self-driving cars and drones has raised questions about their safety and trustworthiness. This is particularly relevant in the event of on-board subsystem errors or failures. In this research, we show how encoded Extended Kalman Filter can be used to detect anomalous behaviors of critical components of nonlinear autonomous systems: sensors, actuators, state estimation algorithms and control software. As opposed to prior work that is limited to linear systems or requires the use of cumbersome machine learned checks with fixed detection thresholds, the proposed approach necessitates the use of time-varying checks with dynamically adaptive thresholds. The method is lightweight in comparison to existing methods (does not rely on machine learning paradigms) and achieves high coverage as well as low detection latency of errors. A quadcopter and an automotive steer-by-wire system are used as test vehicles for the research and simulation and hardware results indicate the overhead, coverage and error detection latency benefits of the proposed approach. 
    more » « less
  6. null (Ed.)
    In this paper we propose a framework for concurrent detection of soft computation errors in particle filters which are finding increasing use in robotics applications. The particle filter works by sampling the multi-variate probability distribution of the states of a system (samples called particles, each particle representing a vector of states) and projecting these into the future using appropriate nonlinear mappings. We propose the addition of a `check' state to the system as a linear combination of the system states for error detection. The check state produces an error signal corresponding to each particle, whose statistics are tracked across a sliding time window. Shifts in the error statistics across all particles are used to detect soft computation errors as well as anomalous sensor measurements. Simulation studies indicate that errors in particle filter computations can be detected with high coverage and low latency. 
    more » « less