The high computation and memory storage of large deep neural networks (DNNs) models pose intensive challenges to the conventional Von-Neumann architecture, incurring substantial data movements in the memory hierarchy. The memristor crossbar array has emerged as a promising solution to mitigate the challenges and enable low-power acceleration of DNNs. Memristor-based weight pruning and weight quantization have been separately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model. However, there has been no systematic investigation of memristor-based neuromorphic computing (NC) systems considering both weight pruning and weight quantization. In this paper, we propose an unified and systematic memristor-based framework considering both structured weight pruning and weight quantization by incorporating alternating direction method of multipliers (ADMM) into DNNs training. We consider hardware constraints such as crossbar blocks pruning, conductance range, and mismatch between weight value and real devices, to achieve high accuracy and low power and small area footprint. Our framework is mainly integrated by three steps, i.e., memristor- based ADMM regularized optimization, masked mapping and retraining. Experimental results show that our proposed frame- work achieves 29.81× (20.88×) weight compression ratio, with 98.38% (96.96%) and 98.29% (97.47%) power and area reduction on VGG-16 (ResNet-18) network where only have 0.5% (0.76%) accuracy loss, compared to the original DNN models. We share our models at anonymous link http://bit.ly/2Jp5LHJ .
more »
« less
Robust Deep Reservoir Computing through Reliable Memristor with Improved Heat Dissipation Capability
Deep Neural Networks (DNNs), a brain-inspired learning methodology, requires tremendous data for training before performing inference tasks. The recent studies demonstrate a strong positive correlation between the inference accuracy and the size of the DNNs and datasets, which leads to an inevitable demand for large DNNs. However, conventional memory techniques are not adequate to deal with the drastic growth of dataset and neural network size. Recently, a resistive memristor has been widely considered as the next generation memory device owing to its high density and low power consumption. Nevertheless, its high switching resistance variations (cycle-tocycle) restrict its feasibility in deep learning. In this work, a novel memristor configuration with the enhanced heat dissipation feature is fabricated and evaluated to address this challenge. Our experimental results demonstrate our memristor reduces the resistance variation by 30% and the inference accuracy increases correspondingly in a similar range. The accuracy increment is evaluated by our Deep Delay-feed-back (Deep-DFR) reservoir computing model. The design area, power consumption, and latency are reduced by 48%, 42%, and 67%, respectively, compared to the conventional SRAM memory technique (6T). The performance of our memristor is improved at various degrees ( 13%-73%) compared to the state-of-the-art memristors.
more »
« less
- PAR ID:
- 10209059
- Date Published:
- Journal Name:
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- ISSN:
- 0278-0070
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep learning that utilizes large-scale deep neural networks (DNNs) is effective in automatic high-level feature extraction but also computation and memory intensive. Constructing DNNs using block-circulant matrices can simultaneously achieve hardware acceleration and model compression while maintaining high accuracy. This paper proposes HSIM-DNN, an accurate hardware simulator on the C++ platform, to simulate the exact behavior of DNN hardware implementations and thereby facilitate the block-circulant matrix-based design of DNN training and inference procedures in hardware. Real FPGA implementations validate the simulator with various circulant block sizes and data bit lengths taking into account accuracy, compression ratio and power consumption, which provides excellent insights for hardware design.more » « less
-
Spiking Neural Networks (SNNs) are energy-efficient artificial neural network models that can carry out data-intensive applications. Energy consumption, latency, and memory bottleneck are some of the major issues that arise in machine learning applications due to their data-demanding nature. Memristor-enabled Computing-In-Memory (CIM) architectures have been able to tackle the memory wall issue, eliminating the energy and time-consuming movement of data. In this work we develop a scalable CIM-based SNN architecture with our fabricated two-layer memristor crossbar array. In addition to having an enhanced heat dissipation capability, our memristor exhibits substantial enhancement of 10% to 66% in design area, power and latency compared to state-of-the-art memristors. This design incorporates an inter-spike interval (ISI) encoding scheme due to its high information density to convert the incoming input signals into spikes. Furthermore, we include a time-to-first-spike (TTFS) based output processing stage for its energy-efficiency to carry out the final classification. With the combination of ISI, CIM and TTFS, this network has a competitive inference speed of 2μs/image and can successfully classify handwritten digits with 2.9mW of power and 2.51pJ energy per spike. The proposed architecture with the ISI encoding scheme can achieve ∼10% higher accuracy than those of other encoding schemes in the MNIST dataset.more » « less
-
When deep neural network (DNN) is extensively utilized for edge AI (Artificial Intelligence), for example, the Internet of things (IoT) and autonomous vehicles, it makes CMOS (Complementary Metal Oxide Semiconductor)-based conventional computers suffer from overly large computing loads. Memristor-based devices are emerging as an option to conduct computing in memory for DNNs to make them faster, much more energy efficient, and accurate. Despite having excellent properties, the memristor-based DNNs are yet to be commercially available because of Stuck-At-Fault (SAF) defects. A Mapping Transformation (MT) method is proposed in this paper to mitigate Stuck-at-Fault (SAF) defects from memristor-based DNNs. First, the weight distribution for the VGG8 model with the CIFAR10 dataset is presented and analyzed. Then, the MT method is used for recovering inference accuracies at 0.1% to 50% SAFs with two typical cases, SA1 (Stuck-At-One): SA0 (Stuck-At-Zero) = 5:1 and 1:5, respectively. The experiment results show that the MT method can recover DNNs to their original inference accuracies (90%) when the ratio of SAFs is smaller than 2.5%. Moreover, even when the SAF is in the extreme condition of 50%, it is still highly efficient to recover the inference accuracy to 80% and 21%. What is more, the MT method acts as a regulator to avoid energy and latency overhead generated by SAFs. Finally, the immunity of the MT Method against non-linearity is investigated, and we conclude that the MT method can benefit accuracy, energy, and latency even with high non-linearity LTP = 4 and LTD = −4.more » « less
-
With the prosperous development of Deep Neural Network (DNNs), numerous Process-In-Memory (PIM) designs have emerged to accelerate DNN models with exceptional throughput and energy-efficiency. PIM accelerators based on Non-Volatile Memory (NVM) or volatile memory offer distinct advantages for computational efficiency and performance. NVM based PIM accelerators, demonstrated success in DNN inference, face limitations in on-device learning due to high write energy, latency, and instability. Conversely, fast volatile memories, like SRAM, offer rapid read/write operations for DNN training, but suffer from significant leakage currents and large memory footprints. In this paper, for the first time, we present a fully-digital sparse processing in hybrid NVM-SRAM design, synergistically combines the strengths of NVM and SRAM, tailored for on-device continual learning. Our designed NVM and SRAM based PIM circuit macros could support both storage and processing of N:M structured sparsity pattern, significantly improving the storage and computing efficiency. Exhaustive experiments demonstrate that our hybrid system effectively reduces area and power consumption while maintaining high accuracy, offering a scalable and versatile solution for on-device continual learning.more » « less
An official website of the United States government

