- Award ID(s):
- 1740126
- PAR ID:
- 10059774
- Date Published:
- Journal Name:
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- ISSN:
- 0278-0070
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
In this paper, we propose MRIMA, as a novel MRAM-based In-Memory Accelerator for non-volatile, flexible, and efficient in-memory computing. MRIMA transforms current Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) arrays to massively parallel computational units capable of working as both non-volatile memory and in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, MRIMA exploits hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a single clock cycle, overcoming the multi-cycle logic issue in contemporary Processing-In-Memory (PIM) platforms. We present practical case studies to demonstrate MRIMA’s acceleration for binary-weight and low bit-width Convolutional Neural Networks (CNN) as well as data encryption. Our device-to-architecture co-simulation results on CNN acceleration demonstrate that MRIMA can obtain 1.7× better energy-efficiency and 11.2× speed-up compared to ASICs, and, 1.8× better energy-efficiency and 2.4× speed-up over the best DRAM-based PIM solutions. As an AES in-memory encryption engine, MRIMA shows 77% and 21% lower energy consumption compared to CMOS-ASIC and recent domain wall-based design, respectively.more » « less
-
In this paper we propose a Highly Flexible InMemory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality. It could pre-process data within memory to further reduce power hungry long distance communication between memory and processing units as in Von-Neumann computing system. HieIM can implement all the Boolean logic functions (AND/NAND, OR/NOR, XOR/XNOR) between any two cells in the same memory array, thus overcoming the `operand locality' problem in contemporary in-memory computing platform designs. To investigate the performance of HieIM, we test in-memory bulk bit-wise Boolean logic operations using different vector datasets, which shows ~ 8x energy saving and ~ 5x speedup compared to recent DRAM based in-memory computing platform. We further implement an in-memory data encryption engine design based on HieIM as another case study. With AES algorithm, it shows 51.5% and 68.9% lower energy consumption compared to CMOS-ASIC and CMOL based implementations, respectively.more » « less
-
In this paper, we propose PIMA-Logic, as a novel Processing-in-Memory Architecture for highly flexible and efficient Logic computation. Insteadof integrating complex logic units in cost-sensitive memory, PIMA-Logic exploits a hardware-friendly approach to implement Boolean logic functions between operands either located in the same row or the same column within entire memory arrays. Furthermore, it can efficiently process more complex logic functions between multiple operands to further reduce the latency and power-hungry data movement. The proposed architecture is developed based on Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array and it can simultaneously work as a non-volatile memory and a reconfigurable in-memory logic. The device-to-architecture co-simulation results show that PIMA-Logic can achieve up to 56% and 31.6% improvements with respect to overall energy and delay on combinational logic benchmarks compared to recent Pinatubo architecture. We further implement an in-memory data encryption engine based on PIMA-Logic as a case study. With AES application, it shows 77.2% and 21% lower energy consumption compared to CMOS-ASIC and recent RIMPA implementation, respectively.more » « less
-
Latest algorithmic development has brought competitive classification accuracy for neural networks despite constraining the network parameters to ternary or binary representations. These findings show significant optimization opportunities to replace computationally-intensive convolution operations (based on multiplication) with more efficient and less complex operations such as addition. In hardware implementation domain, processing-in-memory architecture is becoming a promising solution to alleviate enormous energy-hungry data communication between memory and processing units, bringing considerable improvement for system performance and energy efficiency while running such large networks. In this paper, we review several of our recent works regarding Processing-in-Memory (PIM) accelerator based on Magnetic Random Access Memory computational sub-arrays to accelerate the inference mode of quantized neural networks using digital non-volatile memory rather than using analog crossbar operation. In this way, we investigate the performance of two distinct in-memory addition schemes compared to other digital methods based on processing-in-DRAM/GPU/ASIC design to tackle DNN power and memory wall bottleneck.more » « less
-
Many IoT applications require high computational performance and flexibility, and FPGA is a promising candidate. However, increased computation power results in higher energy dissipation, and energy efficiency is one of the key concerns for IoT applications. In this paper, we explore adiabatic logic for designing an energy efficient configurable logic block (CLB) and compare it to the CMOS counterpart. The simulation results show that the proposed adiabatic-logic-based look-up table (LUT) has significant energy savings for the frequency range of 1 MHz to 40 MHz, and the least energy savings is at 40 MHz, which is 92.94% energy reduction compared to its CMOS counterpart. Further, the three proposed adiabatic-logic-based memory cells are 14T, 16T, and 12T designs with at least 88.2%, 84.2%, and 87.2% energy savings. Also, we evaluated the performance of the proposed CLBs using an adiabatic-logic-based LUT (AL-LUT) interfacing with adiabatic-logic-based memory cells. The proposed design shows significant energy reduction compared to a CMOS LUT interface with SRAM cells for different frequencies; the energy savings are at least 91.6% for AL-LUT 14T, 89.7% for AL-LUT 16T, and 91.3% AL-LUT 12T.