skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Experimental demonstration of magnetic tunnel junction-based computational random-access memory
Abstract The conventional computing paradigm struggles to fulfill the rapidly growing demands from emerging applications, especially those for machine intelligence because much of the power and energy is consumed by constant data transfers between logic and memory modules. A new paradigm, called “computational random-access memory (CRAM),” has emerged to address this fundamental limitation. CRAM performs logic operations directly using the memory cells themselves, without having the data ever leave the memory. The energy and performance benefits of CRAM for both conventional and emerging applications have been well established by prior numerical studies. However, there is a lack of experimental demonstration and study of CRAM to evaluate its computational accuracy, which is a realistic and application-critical metric for its technological feasibility and competitiveness. In this work, a CRAM array based on magnetic tunnel junctions (MTJs) is experimentally demonstrated. First, basic memory operations, as well as 2-, 3-, and 5-input logic operations, are studied. Then, a 1-bit full adder with two different designs is demonstrated. Based on the experimental results, a suite of models has been developed to characterize the accuracy of CRAM computation. Scalar addition, multiplication, and matrix multiplication, which are essential building blocks for many conventional and machine intelligence applications, are evaluated and show promising accuracy performance. With the confirmation of MTJ-based CRAM’s accuracy, there is a strong case that this technology will have a significant impact on power- and energy-demanding applications of machine intelligence.  more » « less
Award ID(s):
2230124
PAR ID:
10526675
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Unconventional Computing
Volume:
1
Issue:
1
ISSN:
3004-8672
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract--Spin switch (SS) is a promising spintronic device which exhibits compactness, low power, non-volatility, input-output isolation leveraging giant spin Hall effect, spin transfer torque, and dipolar coupling. In this paper, we propose a novel device-to-architecture co-design for an in-memory computing platform using coterminous SS (IMCS2), which could simultaneously work as non-volatile memory and reconfigurable in-memory logic (AND/NAND, OR/NOR, and XOR/XNOR) without add-on logic circuits to memory chip. The computed logic output could be simply read out like a normal magnetic random access memory bit cell using the shared memory peripheral circuits. Such intrinsic in-memory logic could be used to process data within memory to greatly reduce power-hungry and long distance data communication in the conventional von Neumann computing system. The IMCS2-based in-memory bulk bitwise Boolean vector operation shows ~9x energy saving and ~3x speedup compared with that of DRAM-based in-memory computing platform. We further employ in-memory multiplication to evaluate the performance of the proposed in-memory computing platform for vector-vector multiplication with different vector sizes. 
    more » « less
  2. In-memory processing offers a promising solution for enhancing the performance of data-intensive applications. While analog in-memory computing demonstrates remarkable efficiency, its limited precision is suitable only for approximate computing tasks. In contrast, digital in-memory computing delivers the deterministic precision necessary to accelerate high-assurance applications. Current digital in-memory computing methods typically involve manually breaking down arithmetic operations into in-memory compute kernels. In contrast, traditional digital circuits are synthesized through intricate and automated design workflows. In this article, we introduce a logic synthesis framework called LOGIC, which facilitates the translation of high-level applications into digital in-memory compute kernels that can be executed using non-volatile memory. We propose techniques for decomposing element-wise arithmetic operations into in-memory kernels while minimizing the number of in-memory operations. Additionally, we optimize the sequence of in-memory operations to reduce non-volatile memory utilization. To address the NP-hard execution sequencing optimization problem, we have developed twolook-aheadalgorithms that offer practical solutions. Additionally, we leverage data layout reorganization to efficiently accelerate applications that heavily rely on sparse matrix-vector multiplication operations. Our experimental evaluations demonstrate that our proposed synthesis approach improves the area and latency of fixed-point multiplication by 84% and 20% compared to the state-of-the-art, respectively. Moreover, when applied to scientific computing applications sourced from the SuiteSparse Matrix Collection, our design achieves remarkable improvements in area, latency, and energy efficiency by factors of 4.8×, 2.6×, and 11×, respectively. 
    more » « less
  3. Nowadays, research topics on AI accelerator designs have attracted great interest, where accelerating Deep Neural Network (DNN) using Processing-in-Memory (PIM) platforms is an actively-explored direction with great potential. PIM platforms, which simultaneously aims to address power- and memory-wall bottlenecks, have shown orders of performance enhancement in comparison to the conventional computing platforms with Von-Neumann architecture. As one direction of accelerating DNN in PIM, resistive memory array (aka. crossbar) has drawn great research interest owing to its analog current-mode weighted summation operation which intrinsically matches the dominant Multiplication-and-Accumulation (MAC) operation in DNN, making it one of the most promising candidates. An alternative direction for PIM-based DNN acceleration is through bulk bit-wise logic operations directly performed on the content in digital memories. Thanks to the high fault-tolerant characteristic of DNN, the latest algorithmic progression successfully quantized DNN parameters to low bit-width representations, while maintaining competitive accuracy levels. Such DNN quantization techniques essentially convert MAC operation to much simpler addition/subtraction or comparison operations, which can be performed by bulk bit-wise logic operations in a highly parallel fashion. In this paper, we build a comprehensive evaluation framework to quantitatively compare and analyze aforementioned PIM based analog and digital approaches for DNN acceleration. 
    more » « less
  4. Latest algorithmic development has brought competitive classification accuracy for neural networks despite constraining the network parameters to ternary or binary representations. These findings show significant optimization opportunities to replace computationally-intensive convolution operations (based on multiplication) with more efficient and less complex operations such as addition. In hardware implementation domain, processing-in-memory architecture is becoming a promising solution to alleviate enormous energy-hungry data communication between memory and processing units, bringing considerable improvement for system performance and energy efficiency while running such large networks. In this paper, we review several of our recent works regarding Processing-in-Memory (PIM) accelerator based on Magnetic Random Access Memory computational sub-arrays to accelerate the inference mode of quantized neural networks using digital non-volatile memory rather than using analog crossbar operation. In this way, we investigate the performance of two distinct in-memory addition schemes compared to other digital methods based on processing-in-DRAM/GPU/ASIC design to tackle DNN power and memory wall bottleneck. 
    more » « less
  5. Applications involving machine learning and neural networks have become increasingly essential in the AI revolution. Emerging trends in Resistive RAM technologies provide high-speed, low-cost, scalable solutions for such applications. These RRAM cells provide efficient and sophisticated memory hardware structures for machine-learning applications. However, it is difficult to achieve reliable multilevel cell storage capacity in these memory technologies due to the occurrence of soft and hard errors. As these memories can store multi-bits per cell, exploring limited magnitude symbols(multi-bit) error correction in RRAM is important. This paper proposes a new syndrome-based double error correcting code that divides the syndromes into groups and, uses addition and XOR operations to correct double limited magnitude errors in the RRAM cells. The key idea is to use the built-in current summing capability of RRAM cells to perform the addition operations that are used for the error correction thereby greatly reducing the overhead of the decoding logic needed to implement the ECC. This effectively avoids the need for explicit adder hardware in the decoding logic making it smaller and faster than conventional ECC codes with similar error-correcting capability. Experimental results show that the proposed code reduces the number of check symbols and significantly reduces the decoder area and power by using the RRAM cells to perform the addition. 
    more » « less