skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 17 until 8:00 AM ET on Saturday, May 18 due to maintenance. We apologize for the inconvenience.


Title: Energy Efficient In-Memory Binary Deep Neural Network Accelerator with Dual-Mode SOT-MRAM
In this paper, we explore potentials of leveraging spin-based in-memory computing platform as an accelerator for Binary Convolutional Neural Networks (BCNN). Such platform can implement the dominant convolution computation based on presented Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) array. The proposed array architecture could simultaneously work as non-volatile memory and a reconfigurable in-memory logic (AND, OR) without add-on logic circuits to memory chip as in conventional logic-in-memory designs. The computed logic output could be also simply read out like a normal MRAM bit-cell using the shared memory peripheral circuits. We employ such intrinsic in-memory computing architecture to efficiently process data within memory to greatly reduce power hungry and omit long distance data communication concerning state-of-the-art BCNN hardware.  more » « less
Award ID(s):
1740126
NSF-PAR ID:
10059771
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2017 IEEE International Conference on Computer Design (ICCD)
Page Range / eLocation ID:
609 to 612
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In-Memory Computing (IMC) technology has been considered to be a promising approach to solve well-known memory-wall challenge for data intensive applications. In this paper, we are the first to propose MnM, a novel IMC system with innovative architecture/circuit designs for fast and efficient Min/Max searching computation in emerging Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM). Our proposed SOT-MRAM based in-memory logic circuits are specially optimized to perform parallel, one-cycle XNOR logic that are heavily used in the Min/Max searching-in-memory algorithm. Our novel in-memory XNOR circuit also has an overhead of just two transistors per row when compared to most prior methodologies which typically use multiple sense amplifiers or complex CMOS logic gates. We also design all other required peripheral circuits for implementing complete Min/Max searching-in-MRAM computation. Our cross-layer comprehensive experiments on Dijkstra's algorithm and other sorting algorithms in real word datasets show that our MnM could achieve significant performance improvement over CPUs, GPUs, and other competing IMC platforms based on RRAM/MRAM/DRAM. 
    more » « less
  2. Abstract--Spin switch (SS) is a promising spintronic device which exhibits compactness, low power, non-volatility, input-output isolation leveraging giant spin Hall effect, spin transfer torque, and dipolar coupling. In this paper, we propose a novel device-to-architecture co-design for an in-memory computing platform using coterminous SS (IMCS2), which could simultaneously work as non-volatile memory and reconfigurable in-memory logic (AND/NAND, OR/NOR, and XOR/XNOR) without add-on logic circuits to memory chip. The computed logic output could be simply read out like a normal magnetic random access memory bit cell using the shared memory peripheral circuits. Such intrinsic in-memory logic could be used to process data within memory to greatly reduce power-hungry and long distance data communication in the conventional von Neumann computing system. The IMCS2-based in-memory bulk bitwise Boolean vector operation shows ~9x energy saving and ~3x speedup compared with that of DRAM-based in-memory computing platform. We further employ in-memory multiplication to evaluate the performance of the proposed in-memory computing platform for vector-vector multiplication with different vector sizes. 
    more » « less
  3. In this paper, we propose MRIMA, as a novel MRAM-based In-Memory Accelerator for non-volatile, flexible, and efficient in-memory computing. MRIMA transforms current Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) arrays to massively parallel computational units capable of working as both non-volatile memory and in-memory logic. Instead of integrating complex logic units in cost-sensitive memory, MRIMA exploits hardware-friendly bit-line computing methods to implement complete Boolean logic functions between operands within a memory array in a single clock cycle, overcoming the multi-cycle logic issue in contemporary Processing-In-Memory (PIM) platforms. We present practical case studies to demonstrate MRIMA’s acceleration for binary-weight and low bit-width Convolutional Neural Networks (CNN) as well as data encryption. Our device-to-architecture co-simulation results on CNN acceleration demonstrate that MRIMA can obtain 1.7× better energy-efficiency and 11.2× speed-up compared to ASICs, and, 1.8× better energy-efficiency and 2.4× speed-up over the best DRAM-based PIM solutions. As an AES in-memory encryption engine, MRIMA shows 77% and 21% lower energy consumption compared to CMOS-ASIC and recent domain wall-based design, respectively. 
    more » « less
  4. The HSC-FPGA offers an intriguing feasible architecture for the next generation of configurable fabrics, which allows embracing the advantages of both CMOS and beyond-CMOS technologies without requiring significant modification to the routing structure, programming paradigms, and synthesis tool-chain of the commercial FPGAs. In the HSC-FPGA, the intrinsic characteristics of magnetic random access memory (MRAM)-look-up table (LUT) circuits are used to implement sequential logic, while combinational logic circuits are implemented by static random access memory (SRAM)-LUTs. Fabric-level simulation results for the developed HSC-FPGA show that it can achieve at least 18%, 70%, and 15% reduction in terms of area, standby power, and read power consumption, respectively, for various ISCAS-89 and ITC-99 benchmark circuits compared to conventional SRAM-based FPGAs. The power consumption values can be further decreased by the power-gating allowed by the non-volatility feature of MRAM-LUTs. Moreover, the benefits of increased heterogeneity for reconfigurable computing is extended along realizing probabilistic computing paradigms within a fabric, which is enabled by probabilistic spin logic devices. The cooperating strengths of technology-heterogeneity and heterogeneity in computing paradigm in the proposed HSC-FPGA are leveraged to develop energy-efficient and reliability-aware training and evaluation circuits for deep belief networks with memristive crossbar arrays and p-bit based probabilistic neurons. 
    more » « less
  5. In this paper we propose a Highly Flexible InMemory (HieIM) computing platform using STT MRAM, which can be leveraged to implement Boolean logic functions without sacrificing memory functionality. It could pre-process data within memory to further reduce power hungry long distance communication between memory and processing units as in Von-Neumann computing system. HieIM can implement all the Boolean logic functions (AND/NAND, OR/NOR, XOR/XNOR) between any two cells in the same memory array, thus overcoming the `operand locality' problem in contemporary in-memory computing platform designs. To investigate the performance of HieIM, we test in-memory bulk bit-wise Boolean logic operations using different vector datasets, which shows ~ 8x energy saving and ~ 5x speedup compared to recent DRAM based in-memory computing platform. We further implement an in-memory data encryption engine design based on HieIM as another case study. With AES algorithm, it shows 51.5% and 68.9% lower energy consumption compared to CMOS-ASIC and CMOL based implementations, respectively. 
    more » « less