skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Highly Efficient Layered Syndrome-based Double Error Correction Utilizing Current Summing in RRAM Cells to Simplify Decoder
Applications involving machine learning and neural networks have become increasingly essential in the AI revolution. Emerging trends in Resistive RAM technologies provide high-speed, low-cost, scalable solutions for such applications. These RRAM cells provide efficient and sophisticated memory hardware structures for machine-learning applications. However, it is difficult to achieve reliable multilevel cell storage capacity in these memory technologies due to the occurrence of soft and hard errors. As these memories can store multi-bits per cell, exploring limited magnitude symbols(multi-bit) error correction in RRAM is important. This paper proposes a new syndrome-based double error correcting code that divides the syndromes into groups and, uses addition and XOR operations to correct double limited magnitude errors in the RRAM cells. The key idea is to use the built-in current summing capability of RRAM cells to perform the addition operations that are used for the error correction thereby greatly reducing the overhead of the decoding logic needed to implement the ECC. This effectively avoids the need for explicit adder hardware in the decoding logic making it smaller and faster than conventional ECC codes with similar error-correcting capability. Experimental results show that the proposed code reduces the number of check symbols and significantly reduces the decoder area and power by using the RRAM cells to perform the addition.  more » « less
Award ID(s):
2113914
PAR ID:
10553435
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems
ISSN:
2765-933X
ISBN:
979-8-3503-1500-4
Page Range / eLocation ID:
1 to 4
Format(s):
Medium: X
Location:
Juan-Les-Pins, France
Sponsoring Org:
National Science Foundation
More Like this
  1. Artificial Intelligence (AI) has permeated various domains but is limited by the bottlenecks imposed by data transfer latency inherent in contemporary memory technologies. Matrix multiplication, crucial for neural network training and inference, can be significantly expedited with a complexity of O(1) using Resistive RAM (RRAM) technology, instead of the conventional complexity of O(n2). This positions RRAM as a promising candidate for the efficient hardware implementation of machine learning and neural networks through in-memory computation. However, RRAM manufacturing technology remains in its infancy, rendering it susceptible to soft errors, potentially compromising neural network accuracy and reliability. In this paper, we propose a syndrome-based error correction scheme that employs selective weighted checksums to correct double adjacent column errors in RRAM. The error correction is done on the output of the matrix multiplication thus ensuring correct operation for any number of errors in two adjacent columns. The proposed codes have low redundancy and low decoding latency, making it suitable for high throughput applications. This schemeuses a repeating weight based structure that makes it scalable to large RRAM matrix sizes. 
    more » « less
  2. The Von Neumann bottleneck, a fundamental chal- lenge in conventional computer architecture, arises from the inability to execute fetch and data operations simultaneously due to a shared bus linking processing and memory units. This bottleneck significantly limits system performance, increases energy consumption, and exacerbates computational complex- ity. Emerging technologies such as Resistive Random Access Memories (RRAMs), leveraging crossbar arrays, offer promis- ing alternatives for addressing the demands of data-intensive computational tasks through in-memory computing of analog vector-matrix multiplication (VMM) operations. However, the propagation of errors due to device and circuit-level imperfec- tions remains a significant challenge. In this study, we introduce MELISO (In-Memory Linear Solver), a comprehensive end-to- end VMM benchmarking framework tailored for RRAM-based systems. MELISO evaluates the error propagation in VMM op- erations, analyzing the impact of RRAM device metrics on error magnitude and distribution. This paper introduces the MELISO framework and demonstrates its utility in characterizing and mitigating VMM error propagation using state-of-the-art RRAM device metrics. 
    more » « less
  3. null (Ed.)
    Progress in high-performance computing (HPC) systems has led to complex applications that stress the I/O subsystem by creating vast amounts of data. Lossy compression reduces data size considerably, but a single error renders lossy compressed data unusable. This sensitivity stems from the high information content per bit in compressed data and is a critical issue as soft errors that cause bit-flips have become increasingly commonplace in HPC systems. While many works have improved lossy compressor performance, few have sought to address this critical weakness. This paper presents ARC: Automated Resiliency for Compression. Given user-defined constraints on storage, throughput, and resiliency, ARC automatically determines the optimal error-correcting code (ECC) configuration before encoding data. We conduct an extensive fault injection study to fully understand the effects of soft errors on lossy compressed data and how to best protect it. We evaluate ARC's scalability, performance, resiliency, and ease of use. We find on a 40 core node that encoding and decoding demonstrate throughput up to 3730 MB/s and 3602 MB/s. ARC also detects and corrects multi-bit errors with a tunable overhead in terms of storage and throughput. Finally, we display the ease of using ARC and how to consider a systems failure rate when determining the constraints. 
    more » « less
  4. Abstract The constituent parts of a quantum computer are inherently vulnerable to errors. To this end, we have developed quantum error-correcting codes to protect quantum information from noise. However, discovering codes that are capable of a universal set of computational operations with the minimal cost in quantum resources remains an important and ongoing challenge. One proposal of significant recent interest is the gauge color code. Notably, this code may offer a reduced resource cost over other well-studied fault-tolerant architectures by using a new method, known as gauge fixing, for performing the non-Clifford operations that are essential for universal quantum computation. Here we examine the gauge color code when it is subject to noise. Specifically, we make use of single-shot error correction to develop a simple decoding algorithm for the gauge color code, and we numerically analyse its performance. Remarkably, we find threshold error rates comparable to those of other leading proposals. Our results thus provide the first steps of a comparative study between the gauge color code and other promising computational architectures. 
    more » « less
  5. Iterative decoders for finite length quantum low-density parity-check (QLDPC) codes are attractive because their hardware complexity scales only linearly with the number of physical qubits. However, they are impacted by short cycles, detrimental graphical configurations known as trapping sets (TSs) present in a code graph as well as symmetric degeneracy of errors. These factors significantly degrade the decoder decoding probability performance and cause so-called error floor. In this paper, we establish a systematic methodology by which one can identify and classify quantum trapping sets (QTSs) according to their topological structure and decoder used. The conventional definition of a TS from classical error correction is generalized to address the syndrome decoding scenario for QLDPC codes. We show that the knowledge of QTSs can be used to design better QLDPC codes and decoders. Frame error rate improvements of two orders of magnitude in the error floor regime are demonstrated for some practical finite-length QLDPC codes without requiring any post-processing. 
    more » « less