skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: An ultra-low energy internally analog, externally digital vector-matrix multiplier based on NOR flash memory technology
Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400×400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.  more » « less
Award ID(s):
1740352
PAR ID:
10113029
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM/IEEE Design Automation Conference (DAC'18)
Page Range / eLocation ID:
22
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Low-to-medium resolution analog vector-by-matrix multipliers (VMMs) offer a remarkable energy/area efficiency as compared to their digital counterparts. Still, the maximum attainable performance in analog VMMs is often bounded by the overhead of the peripheral circuits. The main contribution of this paper is the design of novel sensing circuitry which improves energy-efficiency and density of analog multipliers. The proposed circuit is based on translinear Gilbert cell, which is topologically combined with a floating nonlinear resistor and a low-gain amplifier. Several compensation techniques are employed to ensure reliability with respect to process, temperature, and supply voltage variations. As a case study, we consider implementation of couple-gate current-mode VMM with embedded split-gate NOR flash memory. Our simulation results show that a 4-bit 100x100 VMM circuit designed in 55 nm CMOS technology achieves the record-breaking performance of 3.63 POps/J. 
    more » « less
  2. The success of artificial neural networks (ANNs) in machine vision techniques has driven hardware researchers to explore more efficient computing elements for energy-expensive operations such as vector-matrix multiplication (VMM). In this work, InP-based floating-gate photo-field-effective transistors (FG-PFETs) are demonstrated as computing elements that integrate both photodetection and initial signal processing at the sensor level. These devices are fabricated from semiconductor channels grown via a back-end CMOS compatible templated liquid phase (TLP) approach. Individual devices are shown to exhibit programmable responsivity, mimicking the effect of a synapse connecting the photodetector to a neuron. Using these devices, a simulated optical neural network (ONN) where the experimentally measured performance of FG-PFETs is used as an input shows excellent image recognition accuracy for color-mixed handwritten digits. 
    more » « less
  3. We propose an extremely dense, energy-efficient mixed-signal vector-by-matrix-multiplication (VMM) circuits based on the existing 3D-NAND flash memory blocks, without any need for their modification. Such compatibility is achieved using a time-domain-encoded VMM design. We have performed rigorous simulations of such a circuit, taking into account non-idealities such as drain-induced barrier lowering, capacitive coupling, charge injection, parasitics, process variations, and noise. Our results, for example, show that the 4-bit VMM of 200-element vectors, using the commercially available 64-layer gate-all-around macaroni-type 3D-NAND memory blocks designed in the 55-nm technology node, may provide an unprecedented area efficiency of 0.14 µm2/byte and energy efficiency of ~11 fJ/Op, including the input/output and other peripheral circuitry overheads. 
    more » « less
  4. Abstract Solving linear systems, often accomplished by iterative algorithms, is a ubiquitous task in science and engineering. To accommodate the dynamic range and precision requirements, these iterative solvers are carried out on floating-point processing units, which are not efficient in handling large-scale matrix multiplications and inversions. Low-precision, fixed-point digital or analog processors consume only a fraction of the energy per operation than their floating-point counterparts, yet their current usages exclude iterative solvers due to the cumulative computational errors arising from fixed-point arithmetic. In this work, we show that for a simple iterative algorithm, such as Richardson iteration, using a fixed-point processor can provide the same convergence rate and achieve solutions beyond its native precision when combined with residual iteration. These results indicate that power-efficient computing platforms consisting of analog computing devices can be used to solve a broad range of problems without compromising the speed or precision. 
    more » « less
  5. This paper presents a Ferroelectric FET (FeFET) based processing-in-memory (PIM) architecture to accelerate inference of deep neural networks (DNNs). We propose a digital in-memory vector-matrix multiplication (VMM) engine design utilizing the FeFET crossbar to enables bit-parallel computation and eliminate analog-to-digital conversion in prior mixed-signal PIM designs. A dedicated hierarchical network-on-chip (H-NoC) is developed for input broadcasting and on-the-fly partial results processing, reducing the data transmission volume and latency. Simulations in 28nm CMOS technology show 115x and 6.3x higher computing efficiency (GOPs/W) over desktop GPU (Nvidia GTX 1080Ti) and ReRAM based design, respectively. 
    more » « less