This paper presents a novel transposed MRAM architecture (WinEdge) specifically optimized for Winograd convolution acceleration in edge computing devices. Leveraging Magnetic Tunnel Junctions (MTJs) with Spin Hall Effect (SHE)-assisted Spin-Transfer Torque (STT) writing, the proposed design enables a single SHE current to simultaneously write data to four MTJs, substantially reducing power consumption. Additionally, the integration of stacked MTJs significantly improves storage density. The proposed WinEdge efficiently supports both standard and transposed data access modes regardless of bit-width, achieving up to 36% lower power, 47% reduced energy consumption, and 28% faster processing speed compared to existing designs. Simulations conducted in 45 nm CMOS technology validate its superiority over conventional SRAM-based solutions for convolutional neural network (CNN) acceleration in resource-constrained edge environments.
more »
« less
Clockless Spin-based Look-Up Tables with Wide Read Margin
In this paper, we develop a 6-input fracturable non-volatile Clockless LUT (C-LUT) using spin Hall effect (SHE)-based Magnetic Tunnel Junctions (MTJs) and provide a detailed comparison between the SHE-MTJ-based C-LUT and Spin Transfer Torque (STT)-MTJ-based C-LUT. The proposed C-LUT offers an attractive alternative for implementing combinational logic as well as sequential logic versus previous spin-based LUT designs in the literature. Foremost, C-LUT eliminates the sense amplifier typically employed by using a differential polarity dual MTJ design, as opposed to a static reference resistance MTJ. This realizes a much wider read margin and the Monte Carlo simulation of the proposed fracturable C-LUT indicates no read and write errors in the presence of a variety of process variations scenarios involving MOS transistors as well as MTJs. Additionally, simulation results indicate that the proposed C-LUT reduces the standby power dissipation by $5.4$-fold compared to the SRAM-based LUT. Furthermore, the proposed SHE-MTJ-based C-LUT reduces the area by 1.3-fold and 2-fold compared to the SRAM-based LUT and the STT-MTJ-based C-LUT, respectively.
more »
« less
- Award ID(s):
- 1810256
- PAR ID:
- 10092048
- Date Published:
- Journal Name:
- ACM Great Lakes Symposium on VLSI 2019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Spin transfer torque magnetic random access memory (STT-MRAM) offers a promising solution for low-power and high-density memory due to its compatibility with CMOS, higher density, scalable nature, and non-volatility. However, the higher energy required to write bit cells has remained a key challenge for its adaptation into battery-operated smart handheld devices. The existing low-energy writing solutions require additional complex control logic mechanisms, further constraining the available area. In this research, we propose a solution to design energy-efficient write circuits by incorporating two techniques together. First, we propose the sinusoidal power clocking mechanism replacing the DC power supply in the conventional CMOS design. Second, we propose three lookup table (LUT)-based control logic circuits and one write circuit to reduce the area and further minimize energy dissipation. The experimental results are verified over the case study implementations of 4×4 STT-MRAM macro designed using bit cell configurations: i) one transistor and one magnetic tunnel junction (MTJ) (1T-1MTJ) and ii) four transistors and two MTJs (4T-2MTJ). The post-layout simulation for the frequency range from 250 kHz to 6.25 MHz shows that the write circuit, which uses the proposed LUT-based control logic circuits and a write driver with a sinusoidal power supply, achieves more than a 65.05% average energy saving compared to the CMOS counterpart. Furthermore, the write circuit, which uses the proposed 6T write driver with the sinusoidal power supply, shows an improvement in energy saving by more than 70.60% compared to the CMOS counterpart. We also verified that the energy-saving performance remains relatively consistent with the change in temperature and the tunneling magnetoresistance (TMR) ratio.more » « less
-
With the rapid advancement of DNNs, numerous Process-in-Memory (PIM) architectures based on various memory technologies (Non-Volatile (NVM)/Volatile Memory) have been developed to accelerate AI workloads. Magnetic Random Access Memory (MRAM) is highly promising among NVMs due to its zero standby leakage, fast write/read speeds, CMOS compatibility, and high memory density. However, existing MRAM technologies such as spin-transfer torque MRAM (STT-MRAM) and spin-orbit torque MRAM (SOT-MRAM), have inherent limitations. STT-MRAM faces high write current requirements, while SOT-MRAM introduces significant area overhead due to additional access transistors. The new STT-assisted-SOT (SAS) MRAM provides an area-efficient alternative by sharing one write access transistor for multiple magnetic tunnel junctions (MTJs). This work presents the first fully digital processing-in-SAS-MRAM system to enable 8-bit floating-point (FP8) neural network inference with an application in on-device session-based recommender system. A SAS-MRAM device prototype is fabricated with 4 MTJs sharing the same SOT metal line. The proposed SAS-MRAM-based PIM macro is designed in TSMC 28nm technology. It achieves 15.31 TOPS/W energy efficiency and 269 GOPS performance for FP8 operations at 700 MHz. Compared to state-of-the-art recommender systems for the same popular YooChoose dataset, it demonstrates a 86 ×, 1.8 ×, and 1.12 × higher energy efficiency than that of GPU, SRAM-PIM, and ReRAM-PIM, respectively.more » « less
-
Many IoT applications require high computational performance and flexibility, and FPGA is a promising candidate. However, increased computation power results in higher energy dissipation, and energy efficiency is one of the key concerns for IoT applications. In this paper, we explore adiabatic logic for designing an energy efficient configurable logic block (CLB) and compare it to the CMOS counterpart. The simulation results show that the proposed adiabatic-logic-based look-up table (LUT) has significant energy savings for the frequency range of 1 MHz to 40 MHz, and the least energy savings is at 40 MHz, which is 92.94% energy reduction compared to its CMOS counterpart. Further, the three proposed adiabatic-logic-based memory cells are 14T, 16T, and 12T designs with at least 88.2%, 84.2%, and 87.2% energy savings. Also, we evaluated the performance of the proposed CLBs using an adiabatic-logic-based LUT (AL-LUT) interfacing with adiabatic-logic-based memory cells. The proposed design shows significant energy reduction compared to a CMOS LUT interface with SRAM cells for different frequencies; the energy savings are at least 91.6% for AL-LUT 14T, 89.7% for AL-LUT 16T, and 91.3% AL-LUT 12T.more » « less
-
Probabilistic spin logic (PSL) has recently been proposed as a novel computing paradigm that leverages random thermal fluctuations of interacting bodies in a system rather than deterministic switching of binary bits. A PSL circuit is an interconnected network of thermally unstable units called probabilistic bits (p-bits), whose output randomly fluctuates between bits 0 and 1. While the fluctuations generated by p-bits are thermally driven, and therefore, inherently stochastic, the output probability is tunable with an external source. Therefore, information is encoded through probabilities of various configuration of states in the network. Recent studies have shown that these systems can efficiently solve various types of combinatorial optimization problems and Bayesian inference problems that modern computers are unfit for. Previous experimental studies have demonstrated that a single magnetic tunnel junctions (MTJ) designed to be thermally unstable can operate tunable random number generator making it an ideal hardware solution for p-bits. Most proposals for designing an MTJ to operate as a p-bit involve patterning the MTJ as a circular nano-pillar to make the device thermally unstable and then use spin transfer torque (STT) as a tuning mechanism. However, the practical realization of such devices is very challenging since the fluctuation rate of these devices are very sensitive to any device variations or defects caused during fabrication. Despite this challenge, MTJs are still the most promising hardware solution for p-bits because MTJs are very unique in that they can be tuned by multiple other mechanisms such spin orbit torque, magneto-electric coupling, and voltage-controlled exchange coupling. Furthermore, multiple forces can be used simultaneously to drive stochastic switching signals in MTJs. This means there are a large number of methods to tune, or termed as bias, MTJs that can be implemented in p-bit circuits that can alleviate the current challenges of conventional STT driven p-bits. This article serves as a review of all of the different methods that have been proposed to drive random fluctuations in MTJs to operate as a probabilistic bit. Not only will we review the single-biasing mechanisms, but we will also review all the proposed dual-biasing methods, where two independent mechanisms are employed simultaneously. These dual-biasing methods have been shown to have certain advantages such as alleviating the negative effects of device variations and some biasing combinations have a unique capability called ‘two-degrees of tunability’, which increases the information capacity in the signals generated.more » « less
An official website of the United States government

