- Award ID(s):
- 2114200
- PAR ID:
- 10339227
- Date Published:
- Journal Name:
- IEEE Transactions on Computers
- ISSN:
- 0018-9340
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Magnetic Random-Access Memory (MRAM) based p-bit neuromorphic computing devices are garnering increasing interest as a means to compactly and efficiently realize machine learning operations in Restricted Boltzmann Machines (RBMs). When embedded within an RBM resistive crossbar array, the p-bit based neuron realizes a tunable sigmoidal activation function. Since the stochasticity of activation is dependent on the energy barrier of the MRAM device, it is essential to assess the impact of process variation on the voltage-dependent behavior of the sigmoid function. Other influential performance factors arise from varying energy barriers on power consumption requiring a simulation environment to facilitate the multi-objective optimization of device and network parameters. Herein, transportable Python scripts are developed to analyze the output variation under changes in device dimensions on the accuracy of machine learning applications. Evaluation with RBM circuits using the MNIST dataset reveal impacts and limits for processing variation of device fabrication in terms of the resulting energy vs. accuracy tradeoffs, and the resulting simulation framework is available via a Creative Commons license.more » « less
-
In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM.more » « less
-
System on chips (SoCs) are now designed with their own artificial intelligence (AI) accelerator segment to accommodate the ever-increasing demand of deep learning (DL) applications. With powerful multiply and accumulate (MAC) engines for matrix multiplications, these accelerators show high computing performance. However, because of limited memory resources (i.e., bandwidth and capacity), they fail to achieve optimum system performance during large batch training and inference. In this work, we propose a memory system with high on-chip capacity and bandwidth to shift the gear of AI accelerators from memory-bound to achieving system-level peak performance. We develop the memory system with design technology co-optimization (DTCO)-enabled customized spin-orbit torque (SOT)-MRAM as large on-chip memory through system technology co-optimization (STCO) and detailed characterization of the DL workloads. Our workload-aware memory system achieves 8× energy and 9× latency improvement on computer vision (CV) benchmarks in training and 8× energy and 4.5× latency improvement on natural language processing (NLP) benchmarks in training while consuming only around 50% of SRAM area at iso-capacity.more » « less
-
Due to the separate memory and computation units in traditional Von-Neumann architecture, massive data transfer dominates the overall computing system’s power and latency, known as the ‘Memory-Wall’ issue. Especially with ever-increasing deep learning-based AI model size and computing complexity, it becomes the bottleneck for state-of-the-art AI computing systems. To address this challenge, In-Memory Computing (IMC) based Neural Network accelerators have been widely investigated to support AI computing within memory. However, most of those works focus only on inference. The on-device training and continual learning have not been well explored yet. In this work, for the first time, we introduce on-device continual learning with STT-assisted-SOT (SAS) Magnetic Random Access Memory (MRAM) based IMC system. On the hardware side, we have fabricated a SAS-MRAM device prototype with 4 Magnetic Tunnel Junctions (MTJ, each at 100nm × 50nm) sharing a common heavy metal layer, achieving significantly improved memory writing and area efficiency compared to traditional SOT-MRAM. Next, we designed fully digital IMC circuits with our SAS-MRAM to support both neural network inference and on-device learning. To enable efficient on-device continual learning for new task data, we present an 8-bit integer (INT8) based continual learning algorithm that utilizes our SAS-MRAM IMC-supported bit-serial digital in-memory convolution operations to train a small parallel reprogramming Network (Rep-Net) while freezing the major backbone model. Extensive studies have been presented based on our fabricated SAS-MRAM device prototype, cross-layer device-circuit benchmarking and simulation, as well as the on-device continual learning system evaluation.more » « less
-
Abstract This article discusses the current state of development, open research opportunities, and application perspectives of electric‐field‐controlled magnetic tunnel junctions that use the voltage‐controlled magnetic anisotropy effect to control their magnetization. The integration of embedded magnetic random‐access memory (MRAM) into mainstream semiconductor foundry manufacturing opens new possibilities for the development of energy‐efficient, high‐performance, and intelligent computing systems. The current generation of MRAM, which uses the current‐controlled spin‐transfer torque (STT) effect to write information, has gained traction due to its nonvolatile data retention and lower integration cost compared to embedded Flash. However, scaling MRAM to high bit densities will likely require a transition from current‐controlled to voltage‐controlled operation. In this perspective, an overview of voltage‐controlled magnetic anisotropy (VCMA) as a promising beyond‐STT write mechanism for MRAM devices is provided and recent advancements in developing VCMA‐MRAM devices with perpendicular magnetization are highlighted. Starting from the fundamental mechanisms, the key remaining challenges of VCMA‐MRAM, such as increasing the VCMA coefficient, controlling the write error rate, and achieving field‐free VCMA switching are discussed. Then potential solutions are discussed and open research questions are highlighted. Lastly, prospective applications of voltage‐controlled magnetic tunnel junctions (VC‐MTJs) in security applications, extending beyond their traditional role as memory devices are explored.