skip to main content

Title: CSrram: Area-Efficient Low-Power Ex-Situ Training Framework for Memristive Neuromorphic Circuits Based on Clustered Sparsity
Artificial Neural Networks (ANNs) play a key role in many machine learning (ML) applications but poses arduous challenges in terms of storage and computation of network parameters. Memristive crossbar arrays (MCAs) are capable of both computation and storage, making them promising for in-memory computing enabled neural network accelerators. At the same time, the presence of a significant amount of zero weights in ANNs has motivated research in a variety of parameter reduction techniques. However, for crossbar based architectures, the study of efficient methods to take advantage of network sparsity is still in the early stage. This paper presents CSrram, an efficient ex-situ training framework for hybrid CMOS-memristive neuromorphic circuits. CSrram includes a pre-defined block diagonal clustered (BDC) sparsity algorithm to significantly reduce area and power consumption. The proposed framework is verified on a wide range of datasets including MNIST handwritten recognition, fashion MNIST, breast cancer prediction (BCW), IRIS, and mobile health monitoring. Compared to state of the art fully connected memristive neuromorphic circuits, our CSrram with only 25% density of weights in the first junction, provides a power and area efficiency of 1.5x and 2.6x (averaged over five datasets), respectively, without any significant test accuracy loss.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
Page Range / eLocation ID:
465 to 470
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The superior density of passive analog-grade memristive crossbar circuits enables storing large neural network models directly on specialized neuromorphic chips to avoid costly off-chip communication. To ensure efficient use of such circuits in neuromorphic systems, memristor variations must be substantially lower than those of active memory devices. Here we report a 64 × 64 passive crossbar circuit with ~99% functional nonvolatile metal-oxide memristors. The fabrication technology is based on a foundry-compatible process with etch-down patterning and a low-temperature budget. The achieved <26% coefficient of variance in memristor switching voltages is sufficient for programming a 4K-pixel gray-scale pattern with a <4% relative tuning error on average. Analog properties are also successfully verified via experimental demonstration of a 64 × 10 vector-by-matrix multiplication with an average 1% relative conductance import accuracy to model the MNIST image classification by ex-situ trained single-layer perceptron, and modeling of a large-scale multilayer perceptron classifier based on more advanced conductance tuning algorithm.

    more » « less
  2. Abstract

    The progress in the field of neural computation hinges on the use of hardware more efficient than the conventional microprocessors. Recent works have shown that mixed-signal integrated memristive circuits, especially their passive (0T1R) variety, may increase the neuromorphic network performance dramatically, leaving far behind their digital counterparts. The major obstacle, however, is immature memristor technology so that only limited functionality has been reported. Here we demonstrate operation of one-hidden layer perceptron classifier entirely in the mixed-signal integrated hardware, comprised of two passive 20 × 20 metal-oxide memristive crossbar arrays, board-integrated with discrete conventional components. The demonstrated network, whose hardware complexity is almost 10× higher as compared to previously reported functional classifier circuits based on passive memristive crossbars, achieves classification fidelity within 3% of that obtained in simulations, when using ex-situ training. The successful demonstration was facilitated by improvements in fabrication technology of memristors, specifically by lowering variations in theirI–Vcharacteristics.

    more » « less
  3. As the number of weight parameters in deep neural networks (DNNs) continues growing, the demand for ultra-efficient DNN accelerators has motivated research on non-traditional architectures with emerging technologies. Resistive Random-Access Memory (ReRAM) crossbar has been utilized to perform insitu matrix-vector multiplication of DNNs. DNN weight pruning techniques have also been applied to ReRAM-based mixed-signal DNN accelerators, focusing on reducing weight storage and accelerating computation. However, the existing works capture very few peripheral circuits features such as Analog to Digital converters (ADCs) during the neural network design. Unfortunately, ADCs have become the main part of power consumption and area cost of current mixed-signal accelerators, and the large overhead of these peripheral circuits is not solved efficiently. To address this problem, we propose a novel weight pruning framework for ReRAM-based mixed-signal DNN accelerators, named TINYADC, which effectively reduces the required bits for ADC resolution and hence the overall area and power consumption of the accelerator without introducing any computational inaccuracy. Compared to state-of-the-art pruning work on the ImageNet dataset, TINYADC achieves 3.5× and 2.9× power and area reduction, respectively. TINYADC framework optimizes the throughput of state-of-the-art architecture design by 29% and 40% in terms of the throughput per unit of millimeter square and watt (GOPs/s×mm 2 and GOPs/w), respectively. 
    more » « less
  4. The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory hierarchy. The memristor crossbar array has emerged as a promising solution to mitigate the challenges and enable low-power acceleration of DNNs. Memristor-based weight pruning and weight quantization have been separately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model. However, there has been no systematic investigation of memristor-based neuromorphic computing (NC) systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristor-based framework considering both structured weight pruning and weight quantization by incorporating alternating direction method of multipliers (ADMM) into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our framework is mainly integrated by three steps, i.e., memristor- based ADMM regularized optimization, masked mapping and retraining. Experimental results show that our proposed frame- work achieves 29.81× (20.88×) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models. We share our models at anonymous link . 
    more » « less
  5. Wang, Huan (Ed.)
    Defect identification has been a significant task in various fields to prevent the potential problems caused by imperfection. There is great attention for developing technology to accurately extract defect information from the image using a computing system without human error. However, image analysis using conventional computing technology based on Von Neumann structure is facing bottlenecks to efficiently process the huge volume of input data at low power and high speed. Herein efficient defect identification is demonstrated via a morphological image process with minimal power consumption using an oxide transistor and a memristor‐based crossbar array that can be applied to neuromorphic computing. Using a hardware and software codesigned neuromorphic system combined with a dynamic Gaussian blur kernel operation, an enhanced defect detection performance is successfully demonstrated with about 104 times more power‐efficient computation compared to the conventional complementary metal‐oxide semiconductor (CMOS)‐based digital implementation. It is believed the back end of line (BEOL)‐compatible all‐oxide‐based memristive crossbar array provides the unique potential toward universal artificial intelligence of things (AIoT) applications where conventional hardware can hardly be used. 
    more » « less