skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MLP+NeuroSimV3.0: Improving On-chip Learning Performance with Device to Algorithm Optimizations
On-chip learning with compute-in-memory (CIM) paradigm has become popular in machine learning hardware design in the recent years. However, it is hard to achieve high on-chip learning accuracy due to the high nonlinearity in the weight update curve of emerging nonvolatile memory (eNVM) based analog synapse devices. Although digital synapse devices offer good learning accuracy, the row-by-row partial sum accumulation leads to high latency. In this paper, the methods to solve the aforementioned issues are presented with a device-to-algorithm level optimization. For analog synapses, novel hybrid precision synapses with good linearity and more advanced training algorithms are introduced to increase the on-chip learning accuracy. The latency issue for digital synapses can be solved by using parallel partial sum read-out scheme. All these features are included into the recently released MLP + NeuroSimV3.0, which is an in-house developed device-to-system evaluation framework for neuro-inspired accelerators based on CIM paradigm.  more » « less
Award ID(s):
1740225
PAR ID:
10110027
Author(s) / Creator(s):
Date Published:
Journal Name:
International Conference on Neuromorphic Systems (ICONS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The increasingly central role of speech based human computer interaction necessitates on-device, low-latency, low-power, high-accuracy key word spotting (KWS). State-of-the-art accuracies on speech-related tasks have been achieved by long short-term memory (LSTM) neural network (NN) models. Such models are typically computationally intensive because of their heavy use of Matrix vector multiplication (MVM) operations. Compute-in-Memory (CIM) architectures, while well suited to MVM operations, have not seen widespread adoption for LSTMs. In this paper we adapt resistive random access memory based CIM architectures for KWS using LSTMs. We find that a hybrid system composed of CIM cores and digital cores achieves 90% test accuracy on the google speech data set at the cost of 25 uJ/decision. Our optimized architecture uses 5-bit inputs, and analog weights to produce 6-bit outputs. All digital computation are performed with 8-bit precision leading to a 3.7× improvement in computational efficiency compared to equivalent digital systems at that accuracy. 
    more » « less
  2. Abstract The constant drive to achieve higher performance in deep neural networks (DNNs) has led to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor‐based compute‐in‐memory (CIM) modules can perform vector‐matrix multiplication (VMM) in place and in parallel, and have shown great promises in DNN inference applications. However, CIM‐based model training faces challenges due to non‐linear weight updates, device variations, and low‐precision. In this work, a mixed‐precision training scheme is experimentally implemented to mitigate these effects using a bulk‐switching memristor‐based CIM module. Low‐precision CIM modules are used to accelerate the expensive VMM operations, with high‐precision weight updates accumulated in digital units. Memristor devices are only changed when the accumulated weight update value exceeds a pre‐defined threshold. The proposed scheme is implemented with a system‐onchip of fully integrated analog CIM modules and digital sub‐systems, showing fast convergence of LeNet training to 97.73%. The efficacy of training larger models is evaluated using realistic hardware parameters and verifies that CIM modules can enable efficient mix‐precision DNN training with accuracy comparable to full‐precision software‐trained models. Additionally, models trained on chip are inherently robust to hardware variations, allowing direct mapping to CIM inference chips without additional re‐training. 
    more » « less
  3. Neuromorphic computing has recently emerged as a promising paradigm to overcome the von-Neumann bottleneck and enable orders of magnitude improvement in bandwidth and energy efficiency. However, existing complementary metal-oxide-semiconductor (CMOS) digital devices, the building block of our computing system, are fundamentally different from the analog synapses, the building block of the biological neural network—rendering the hardware implementation of the artificial neural networks (ANNs) not scalable in terms of area and power, with existing CMOS devices. In addition, the spatiotemporal dynamic, a crucial component for cognitive functions in the neural network, has been difficult to replicate with CMOS devices. Here, we present the first topological insulator (TI) based electrochemical synapse with programmable spatiotemporal dynamics, where long-term and short-term plasticity in the TI synapse are achieved through the charge transfer doping and ionic gating effects, respectively. We also demonstrate basic neuronal functions such as potentiation/depression and paired-pulse facilitation with high precision (>500 states per device), as well as a linear and symmetric weight update. We envision that the dynamic TI synapse, which shows promising scaling potential in terms of energy and speed, can lead to the hardware acceleration of truly neurorealistic ANNs with superior cognitive capabilities and excellent energy efficiency. 
    more » « less
  4. In-memory computing represents an effective method for modeling complex physical systems that are typically challenging for conventional computing architectures but has been hindered by issues such as reading noise and writing variability that restrict scalability, accuracy, and precision in high-performance computations. We propose and demonstrate a circuit architecture and programming protocol that converts the analog computing result to digital at the last step and enables low-precision analog devices to perform high-precision computing. We use a weighted sum of multiple devices to represent one number, in which subsequently programmed devices are used to compensate for preceding programming errors. With a memristor system-on-chip, we experimentally demonstrate high-precision solutions for multiple scientific computing tasks while maintaining a substantial power efficiency advantage over conventional digital approaches. 
    more » « less
  5. Abstract Realizing increasingly complex artificial intelligence (AI) functionalities directly on edge devices calls for unprecedented energy efficiency of edge hardware. Compute-in-memory (CIM) based on resistive random-access memory (RRAM) 1 promises to meet such demand by storing AI model weights in dense, analogue and non-volatile RRAM devices, and by performing AI computation directly within RRAM, thus eliminating power-hungry data movement between separate compute and memory 2–5 . Although recent studies have demonstrated in-memory matrix-vector multiplication on fully integrated RRAM-CIM hardware 6–17 , it remains a goal for a RRAM-CIM chip to simultaneously deliver high energy efficiency, versatility to support diverse models and software-comparable accuracy. Although efficiency, versatility and accuracy are all indispensable for broad adoption of the technology, the inter-related trade-offs among them cannot be addressed by isolated improvements on any single abstraction level of the design. Here, by co-optimizing across all hierarchies of the design from algorithms and architecture to circuits and devices, we present NeuRRAM—a RRAM-based CIM chip that simultaneously delivers versatility in reconfiguring CIM cores for diverse model architectures, energy efficiency that is two-times better than previous state-of-the-art RRAM-CIM chips across various computational bit-precisions, and inference accuracy comparable to software models quantized to four-bit weights across various AI tasks, including accuracy of 99.0 percent on MNIST 18 and 85.7 percent on CIFAR-10 19 image classification, 84.7-percent accuracy on Google speech command recognition 20 , and a 70-percent reduction in image-reconstruction error on a Bayesian image-recovery task. 
    more » « less