skip to main content

This content will become publicly available on September 1, 2024

Title: A 65nm RRAM Compute-in-Memory Macro for Genome Sequencing Alignment
In genomic analysis, the major computation bottle- neck is the memory- and compute-intensive DNA short reads alignment due to memory-wall challenge. This work presents the first Resistive RAM (RRAM) based Compute-in-Memory (CIM) macro design for accelerating state-of-the-art BWT based genome sequencing alignment. Our design could support all the core instructions, i.e., XNOR based match, count, and addition, required by alignment algorithm. The proposed CIM macro implemented in integration of HfO2 RRAM and 65nm CMOS demonstrates the best energy efficiency to date with 2.07 TOPS/W and 2.12G suffixes/J at 1.0V.  more » « less
Award ID(s):
2144751 2003749 2342726
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of ESSCIRC
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Realizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM) 1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory 2–5 . Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware 6–17 , it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST 18 and 85.7 percent on CIFAR-10 19 image classification, 84.7-percent accuracy on Google speech command recognition 20 , and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task. 
    more » « less
  2. Combinatorial optimization problems prevail in engineering and industry. Some are NP-hard and thus become difficult to solve on edge devices due to limited power and computing resources. Quadratic Unconstrained Binary Optimization (QUBO) problem is a valuable emerging model that can formulate numerous combinatorial problems, such as Max-Cut, traveling salesman problems, and graphic coloring. QUBO model also reconciles with two emerging computation models, quantum computing and neuromorphic computing, which can potentially boost the speed and energy efficiency in solving combinatorial problems. In this work, we design a neuromorphic QUBO solver composed of a swarm of spiking neural networks (SNN) that conduct a population-based meta-heuristic search for solutions. The proposed model can achieve about x20 40 speedup on large QUBO problems in terms of time steps compared to a traditional neural network solver. As a codesign, we evaluate the neuromorphic swarm solver on a 40nm 25mW Resistive RAM (RRAM) Compute-in-Memory (CIM) SoC with a 2.25MB RRAM-based accelerator and an embedded Cortex M3 core. The collaborative SNN swarm can fully exploit the specialty of CIM accelerator in matrix and vector multiplications. Compared to previous works, such an algorithm-hardware synergized solver exhibits advantageous speed and energy efficiency for edge devices. 
    more » « less
  3. Event and frame cameras capture the complemen-tary spatial and temporal details of a scene providing an accuracy vs. latency trade-off. Fusing these processing modalities using convolutional (CNN) and spiking neural networks (SNN) respectively has been shown for target tracking. We present our heterogeneous RRAM compute-in-memory (CIM) and SRAM compute-near-memory (CNM) SoC for simultaneous processing of CNN and SNN. We will show the advantage of using fused vision over frame-only vision and demonstrate python programmable data streaming. The visitors will be able to see the processing-dependent dynamic power gating of non-volatile RRAM and in-memory error correction capability. 
    more » « less
  4. RRAM-based in-memory computing (IMC) effectively accelerates deep neural networks (DNNs) and other machine learning algorithms. On the other hand, in the presence of RRAM device variations and lower precision, the mapping of DNNs to RRAM-based IMC suffers from severe accuracy loss. In this work, we propose a novel hybrid IMC architecture that integrates an RRAM-based IMC macro with a digital SRAM macro using a programmable shifter to compensate for the RRAM variations and recover the accuracy. The digital SRAM macro consists of a small SRAM memory array and an array of multiply-and-accumulate (MAC) units. The non-ideal output from the RRAM macro, due to device and circuit non-idealities, is compensated by adding the precise output from the SRAM macro. In addition, the programmable shifter allows for different scales of compensation by shifting the SRAM macro output relative to the RRAM macro output. On the algorithm side, we develop a framework for the training of DNNs to support the hybrid IMC architecture through ensemble learning. The proposed framework performs quantization (weights and activations), pruning, RRAM IMC-aware training, and employs ensemble learning through different compensation scales by utilizing the programmable shifter. Finally, we design a silicon prototype of the proposed hybrid IMC architecture in the 65nm SUNY process to demonstrate its efficacy. Experimental evaluation of the hybrid IMC architecture shows that the SRAM compensation allows for a realistic IMC architecture with multi-level RRAM cells (MLC) even though they suffer from high variations. The hybrid IMC architecture achieves up to 21.9%, 12.65%, and 6.52% improvement in post-mapping accuracy over state-of-the-art techniques, at minimal overhead, for ResNet-20 on CIFAR-10, VGG-16 on CIFAR-10, and ResNet-18 on ImageNet, respectively. 
    more » « less
  5. In the past decades, memory devices have been playing catch-up to the improving performance of processors. Although memory performance can be improved by the introduction of various configurations of a memory cache hierarchy, memory remains the performance bottleneck at a system level for big-data analytics and machine learning applications. An emerging solution for this problem is the use of a complementary compute cache architecture, using Compute-in-Memory (CiM) technologies, to bring computation close to memory. CiM implements compute primitives (e.g., arithmetic ops, data-ordering ops) which are simple enough to be embedded in the logic layers of emerging memory devices. Analogous to in-core memory caches, the CiM primitives provide low functionality but high performance by reducing data transfers. In this abstract, we describe a novel methodology to perform design space exploration (DSE) through system-level performance modeling and simulation (MODSIM) of CiM architectures for big-data analytics and machine learning applications. 
    more » « less