Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Resistive random access memory (RRAM) based memristive crossbar arrays enable low power and low latency inference for convolutional neural networks (CNNs), making them suitable for deployment in IoT and edge devices. However, RRAM cells within a crossbar suffer from conductance variations, making RRAM-based CNNs vulnerable to degradation of their classification accuracy. To address this, the classification accuracy of RRAM based CNN chips can be estimated using predictive tests, where a trained regressor predicts the accuracy of a CNN chip from the CNN’s response to a compact test dataset. In this research, we present a framework for co-optimizing the pixels of the compact test dataset and the regressor. The novelty of the proposed approach lies in the ability to co-optimize individual image pixels, overcoming barriers posed by the computational complexity of optimizing the large numbers of pixels in an image using state-of-the-art techniques. The co-optimization problem is solved using a three step process: a greedy image downselection followed by backpropagation driven image optimization and regressor fine-tuning. Experiments show that the proposed test approach reduces the CNN classification accuracy prediction error by 31% compared to the state of the art. It is seen that a compact test dataset with only 2-4 images is needed for testing, making the scheme suitable for built-in test applications.more » « less
- 
            Time-to-first-spike(TTFS ) encoded spiking neural networks (SNNs), implemented using memristive crossbar arrays (MCA), achieve higher inference speed and energy efficiency compared to artificial neural networks (ANNs) and rate encoded SNNs. However, memristive crossbar arrays are vulnerable to conductance variations in the embedded memristor cells. These degrade the performance of TTFS encoded SNNs, namely their classification accuracy with adverse impact on the yield of manufactured chips. To combat this yield loss, we propose a post-manufacture testing and tuning framework for these SNNs. In the testing phase, a timing encoded signature of the SNN, which is statistically correlated to the SNN performance, is extracted. In the tuning phase, this signature is mapped to optimal values of the tuning knobs (gain parameters), one parameter per layer, using a trained regressor, allowing very fast tuning (about 150ms). To further reduce the tuning overhead, we rank order hidden layer neurons based on their criticality and show that adding gain programmability only to 50% of the neurons is sufficient for performance recovery. Experiments show that the proposed framework can improve yield by up to 34% and average accuracy of memristive SNNs by up to 9%.more » « less
- 
            Emerging brain-inspired hyperdimensional computing (HDC) algorithms are vulnerable to timing and soft errors in associative memory used to store high-dimensional data representations. Such errors can significantly degrade HDC performance. A key challenge is error correction after an error in computation is detected. This work presents two novel error resilience frameworks for hyperdimensional computing systems. The first, called the checksum hypervector encoding (CHE) framework, relies on creation of a single additional hypervector that is a checksum of all the class hypervectors of the HDC system. For error resilience, elementwise validation of the checksum property is performed and those elements across all class vectors for which the property fails are removed from consideration. For an HDC system with K class hypervectors of dimension D, the second cross-hypervector clustering (CHC) framework clusters D, Kdimensional vectors consisting of the i-th element of each of the K HDC class hypervectors, 1 ≤ i ≤ K. Statistical properties of these vector clusters are checked prior to each hypervector query and all the elements of all K-dimensional vectors corresponding to statistical outlier vectors are removed as before. The choice of which framework to use is dictated by the complexity of the dataset to classify. Up to three orders of magnitude better resilience to errors than the state-of-the-art across multiple HDC high-dimensional encoding (representation) systems is demonstrated.more » « less
- 
            Brain-inspired hyperdimensional (HD) computing models mimic cognition through combinatorial bindings of biological neuronal data represented by high-dimensional vectors and related operations. However, the efficacy of HD computing depends strongly on input signal and data features used to realize such bindings. In this paper, we propose a new HD-computing framework based on a co-trainable DNN-based feature extractor pre-processor and a hyperdimensional computing system. When trained with restrictions on the ranges of hypervector elements for resilience to memory access errors, the framework achieves up to 135% accuracy improvement over baseline HD-computing for error-free operation and up to three orders of magnitude improvement in error resilience compared to the state-of-the-art. Results for a range of applications from image classification, face recognition, human activity recognition and medical diagnosis are presented and demonstrate the viability of the proposed ideas.more » « less
- 
            Variability-induced accuracy degradation of RRAM based DNNs is of great concern due to their significant potential for use in future energy-efficient machine learning architectures. To address this, we propose a two-step process. First, an enhanced testing procedure is used to predict DNN accuracy from a set of compact test stimuli (images). This test response (signature) is simply the concatenated vectors of output neurons of intermediate final DNN layers over the compact test images applied. DNNs with a predicted accuracy below a threshold are then tuned based on this signature vector. Using a clustering based approach, the signature is mapped to the optimal tuning parameter values of the DNN (determined using off-line training of the DNN via backpropagation) in a single step, eliminating any post-manufacture training of the DNN weights (expensive). The tuning parameters themselves consist of the gains and offsets of the ReLU activation of neurons of the DNN on a per-layer basis and can be tuned digitally. Tuning is achieved in less than a second of tuning time, with yield improvements of over 45% with a modest accuracy reduction of 4% compared to digital DNNs.more » « less
- 
            Online reinforcement learning (RL) based systems are being increasingly deployed in a variety of safety-critical applications ranging from drone control to medical robotics. These systems typically use RL onboard rather than relying on remote operation from high-performance datacenters. Due to the dynamic nature of the environments they work in, onboard RL hardware is vulnerable to soft errors from radiation, thermal effects and electrical noise that corrupt the results of computations. Existing approaches to on-line error resilience in machine learning systems have relied on availability of the large training datasets to configure resilience parameters, which is not necessarily feasible for online RL systems. Similarly, other approaches involving specialized hardware or modifications to training algorithms are difficult to implement for onboard RL applications. In contrast, we present a novel error resilience approach for online RL that makes use of running statistics collected across the (real-time) RL training process to configure error detection thresholds without the need to access a reference training dataset. In this methodology, statistical concentration bounds leveraging running statistics are used to diagnose neuron outputs as erroneous. These erroneous neurons are then set to zero (suppressed). Our approach is compared against the state of the art and validated on several RL algorithms involving the use of multiple concentration bounds on CPU as well as GPU hardware.more » « less
- 
            Spiking Neural Networks (SNNs) can be implemented with power-efficient digital as well as analog circuitry. However, in Resistive RAM (RRAM) based SNN accelerators, synapse weights programmed into the crossbar can differ from their ideal values due to defects and programming errors, degrading inference accuracy. In addition, circuit nonidealities within analog spiking neurons that alter the neuron spiking rate (modeled by variations in neuron firing threshold) can degrade SNN inference accuracy when the value of inference time steps (ITSteps) of SNN is set to a critical minimum that maximizes network throughput. We first develop a recursive linearized check to detect synapse weight errors with high sensitivity. This triggers a correction methodology which sets out-of-range synapse values to zero. For correcting the effects of firing threshold variations, we develop a test methodology that calibrates the extent of such variations. This is then used to proportionally increase inference time steps during inference for chips with higher variation. Experiments on a variety of SNNs prove the viability of the proposed resilience methods.more » « less
- 
            Transformer networks have achieved remarkable success in Natural Language Processing (NLP) and Computer Vision applications. However, the underlying large volumes of Transformer computations demand high reliability and resilience to soft errors in processor hardware. The objective of this research is to develop efficient techniques for design of error resilient Transformer architectures. To enable this, we first perform a soft error vulnerability analysis of every fully connected layers in Transformer computations. Based on this study, error detection and suppression modules are selectively introduced into datapaths to restore Transformer performance under anticipated error rate conditions. Memory access errors and neuron output errors are detected using checksums of linear Transformer computations. Correction consists of determining output neurons with out-of-range values and suppressing the same to zero. For a Transformer with nominal BLEU score of 52.7, such vulnerability guided selective error suppression can recover language translation performance from a BLEU score of 0 to 50.774 with as much as 0.001 probability of activation error, incurring negligible memory and computation overheads.more » « less
- 
            The reliability of emerging neuromorphic compute fabrics is of great concern due to their widespread use in critical data-intensive applications. Ensuring such reliability is difficult due to the intensity of underlying computations (billions of parameters), errors induced by low power operation and the complex relationship between errors in computations and their effect on network performance accuracy. We study the problem of designing error-resilient neuromorphic systems where errors can stem from: (a) soft errors in computation of matrix-vector multiplications and neuron activations, (b) malicious trojan and adversarial security attacks and (c) effects of manufacturing process variations on analog crossbar arrays that can affect DNN accuracy. The core principle of error detection relies on embedded predictive neuron checks using invariants derived from the statistics of nominal neuron activation patterns of hidden layers of a neural network. Algorithmic encodings of hidden neuron function are also used to derive invariants for checking. A key contribution is designing checks that are robust to the inherent nonlinearity of neuron computations with minimal impact on error detection coverage. Once errors are detected, they are corrected using probabilistic methods due to the difficulties involved in exact error diagnosis in such complex systems. The technique is scalable across soft errors as well as a range of security attacks. The effects of manufacturing process variations are handled through the use of compact tests from which DNN performance can be assessed using learning techniques. Experimental results on a variety of neuromorphic test systems: DNNs, spiking networks and hyperdimensional computing are presented.more » « less
- 
            Analog crossbar arrays have recently attracted significant attention due to their usefulness for deep neural net (DNN) computations with ultra-low power consumption. However, recent studies have shown that DNNs implemented with such crossbar arrays suffer from as high as 30% degradation in performance due to the effects of manufacturing process variability effects resulting in degradation of their functional safety. One way to test these DNNs is to apply an exhaustive set of test images to each device to ascertain its performance. This is expensive and time-consuming. We propose an alternative test scheme in which a small subset of test images is applied to each DNN and the classification accuracy of the DNN is predicted directly from observation of the final layer outputs of the network. This saves test cost while allowing binning of DNNs for performance. Experimental results for a variety of test cases are presented and show test efficiency improvements of 3X over testing with the exhaustive test image set.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available