Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400×400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation.
more »
« less
Breaking POps/J Barrier with Analog Multiplier Circuits Based on Nonvolatile Memories
Low-to-medium resolution analog vector-by-matrix multipliers (VMMs) offer a remarkable energy/area efficiency as compared to their digital counterparts. Still, the maximum attainable performance in analog VMMs is often bounded by the overhead of the peripheral circuits. The main contribution of this paper is the design of novel sensing circuitry which improves energy-efficiency and density of analog multipliers. The proposed circuit is based on translinear Gilbert cell, which is topologically combined with a floating nonlinear resistor and a low-gain amplifier. Several compensation techniques are employed to ensure reliability with respect to process, temperature, and supply voltage variations. As a case study, we consider implementation of couple-gate current-mode VMM with embedded split-gate NOR flash memory. Our simulation results show that a 4-bit 100x100 VMM circuit designed in 55 nm CMOS technology achieves the record-breaking performance of 3.63 POps/J.
more »
« less
- Award ID(s):
- 1740352
- PAR ID:
- 10113027
- Date Published:
- Journal Name:
- ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'18)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose an extremely dense, energy-efficient mixed-signal vector-by-matrix-multiplication (VMM) circuits based on the existing 3D-NAND flash memory blocks, without any need for their modification. Such compatibility is achieved using a time-domain-encoded VMM design. We have performed rigorous simulations of such a circuit, taking into account non-idealities such as drain-induced barrier lowering, capacitive coupling, charge injection, parasitics, process variations, and noise. Our results, for example, show that the 4-bit VMM of 200-element vectors, using the commercially available 64-layer gate-all-around macaroni-type 3D-NAND memory blocks designed in the 55-nm technology node, may provide an unprecedented area efficiency of 0.14 µm2/byte and energy efficiency of ~11 fJ/Op, including the input/output and other peripheral circuitry overheads.more » « less
-
The Von Neumann bottleneck, a fundamental chal- lenge in conventional computer architecture, arises from the inability to execute fetch and data operations simultaneously due to a shared bus linking processing and memory units. This bottleneck significantly limits system performance, increases energy consumption, and exacerbates computational complex- ity. Emerging technologies such as Resistive Random Access Memories (RRAMs), leveraging crossbar arrays, offer promis- ing alternatives for addressing the demands of data-intensive computational tasks through in-memory computing of analog vector-matrix multiplication (VMM) operations. However, the propagation of errors due to device and circuit-level imperfec- tions remains a significant challenge. In this study, we introduce MELISO (In-Memory Linear Solver), a comprehensive end-to- end VMM benchmarking framework tailored for RRAM-based systems. MELISO evaluates the error propagation in VMM op- erations, analyzing the impact of RRAM device metrics on error magnitude and distribution. This paper introduces the MELISO framework and demonstrates its utility in characterizing and mitigating VMM error propagation using state-of-the-art RRAM device metrics.more » « less
-
Abstract We first propose an ultra-compact energy-efficient time-domain vector-by-matrix multiplier (VMM) based on commercial 3D-NAND flash memory structure. The proposed 3D-VMM uses a novel resistive successive integrate and re-scaling (RSIR) scheme to eliminate the stringent requirement of a bulky load capacitor which otherwise dominates the area- and energy-landscape of the conventional time-domain VMMs. Our rigorous analysis, performed at the 55 nm technology node, shows that RSIR-3D-VMM achieves a record-breaking area efficiency of ∼0.02 μ m 2 /Byte and the energy efficiency of ∼6 f J/Op for a 500 × 500 4-bit VMM, representing 5× and 1.3× improvements over the previously reported 3D-VMM approach. Moreover, unlike the previous approach, the proposed VMM can be efficiently tailored to work in a smaller current output range. Our second major contribution is the development of 3D-aCortex, a multi-purpose neuromorphic inference processor that utilizes the proposed 3D-VMM block as its core processing unit. Rigorous performance modeling of the 3D-aCortex targeting several state-of-the-art neural network benchmarks has shown that it may provide a record-breaking 30.7 MB mm −2 storage efficiency, 113.3 TOp/J peak energy efficiency, and 10.66 TOp/s computational throughput. The system-level analysis indicates that the gain in the area-efficiency of RSIR leads to a smaller data transfer delay, which compensates for the reduction in the VMM throughput due to an increased input time window.more » « less
-
Abstract Magnetic straintronics made its debut more than a decade ago as an extremely energy-efficient paradigm for implementing a digital switch for digital information processing. The switch consists of a slightly elliptical nano-sized magnetostrictive disk in elastic contact with a poled ultrathin piezoelectric layer (forming a two-phase multiferroic system). Because of the elliptical shape, the nanomagnet’s magnetization has two stable (mutually antiparallel) orientations along the major axis, which can encode the binary bits 0 and 1. A voltage pulse of sub-ns duration and amplitude few to few tens of mV applied across the piezoelectric generates enough strain in the nanomagnet to switch its magnetization from one stable state to the other by virtue of the inverse magnetostriction (or Villari) effect, with an energy expenditure that is roughly an order of magnitude smaller than what it takes to switch a modern-day electronic transistor. That possibility, along with the fact that such a switch is non-volatile unlike the conventional transistor, generated significant excitement. However, it was later tempered by the realization that straintronic switching is also extremelyerror-prone, which may preclude many digital applications, particularly in Boolean logic. In this perspective, we offer the view that there is plenty of room for magnetic straintronics in theanalogdomain, which is much more forgiving of switching errors, and where the excellent energy-efficiency and non-volatility are a boon. Analog straintronics can have intriguing applications in many areas, such as a new genre of aggressively miniaturized electromagnetic antennas that defy the Harrington limits on the gain and radiation efficiency of conventional antennas, analog arithmetic multipliers (and ultimately vector matrix multipliers) for non-volatile deep learning networks with very small footprint and excellent energy-efficiency, and relatively high-power microwave oscillators with output frequency in the X-band. When combined with spintronics, analog straintronics can also implement a new type of spin field effect transistor employing quantum materials such as topological insulators, and they have unusual transfer characteristics which can be exploited for analog tasks such as frequency multiplication using just a single transistor. All this hints at a world of new possibilities in the analog domain that deserves serious attention.more » « less
An official website of the United States government

