skip to main content


Title: Majority-Based Spin-CMOS Primitives for Approximate Computing
Promising for digital signal processing applications, approximate computing has been extensively considered to tradeoff limited accuracy for improvements in other circuit metrics such as area, power, and performance. In this paper, approximate arithmetic circuits are proposed by using emerging nanoscale spintronic devices. Leveraging the intrinsic current-mode thresholding operation of spintronic devices, we initially present a hybrid spin-CMOS majority gate design based on a composite spintronic device structure consisting of a magnetic domain wall motion stripe and a magnetic tunnel junction. We further propose a compact and energy-efficient accuracy-configurable adder design based on the majority gate. Unlike most previous approximate circuit designs that hardwire a constant degree of approximation, this design is adaptive to the inherent resilience in various applications to different degrees of accuracy. Subsequently, we propose two new approximate compressors for utilization in fast multiplier designs. The device-circuit SPICE simulation shows 34.58% and 66% improvement in power consumption, respectively, for the accurate and approximate modes of the accuracy-configurable adder, compared to the recently reported domain wall motion-based full adder design. In addition, the proposed accuracy-configurable adder and approximate compressors can be efficiently utilized in the discrete cosine transform (DCT) as a widely-used digital image processing algorithm. The results indicate that the DCT and inverse DCT (IDCT) using the approximate multiplier achieve ~2x energy saving and 3x speed-up compared to an exactly-designed circuit, while achieving comparable quality in its output result.  more » « less
Award ID(s):
1740126
NSF-PAR ID:
10094214
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Nanotechnology
Volume:
17
Issue:
4
ISSN:
1536-125X
Page Range / eLocation ID:
795 to 806
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents a configurable binary design library including fundamental arithmetic circuits like full-adder, full-subtractor, binary multiplier, shifter, and more. The Chisel Hardware Construction Language (HCL) is employed to build the parameterizable designs with different precision including half-word, word, double-word, and quad-word. Chisel HCL is an open-source embedded domain-specific language that inherits the object-oriented and functional programming aspects of Scala for constructing hardware. Experimental results show the same accuracy achieved by our proposed work compared with the Verilog HDL implementations. The hardware cost in terms of slice count, power consumption, and the maximum clock frequency is further estimated. Compared with traditional design intellectual properties (IPs) provided by IP vendors, our proposed work is configurable and expandable to the other arithmetic implementations and projects. 
    more » « less
  2. From signal processing to emerging deep neural networks, a range of applications exhibit intrinsic error resilience. For such applications, approximate computing opens up new possibilities for energy-efficient computing by producing slightly inaccurate results using greatly simplified hardware. Adopting this approach, a variety of basic arithmetic units, such as adders and multipliers, have been effectively redesigned to generate approximate results for many error-resilient applications.In this work, we propose SECO, an approximate exponential function unit (EFU). Exponentiation is a key operation in many signal processing applications and more importantly in spiking neuron models, but its energy-efficient implementation has been inadequately explored. We also introduce a cross-layer design method for SECO to optimize the energy-accuracy trade-off. At the algorithm level, SECO offers runtime scaling between energy efficiency and accuracy based on approximate Taylor expansion, where the error is minimized by optimizing parameters using discrete gradient descent at design time. At the circuit level, our error analysis method efficiently explores the design space to select the energy-accuracy-optimal approximate multiplier at design time. In tandem, the cross-layer design and runtime optimization method are able to generate energy-efficient and accurate approximate EFU designs that are up to 99.7% accurate at a power consumption of 3.73 pJ per exponential operation. SECO is also evaluated on the adaptive exponential integrate-and-fire neuron model, yielding only 0.002% timing error and 0.067% value error compared to the precise neuron model. 
    more » « less
  3. null (Ed.)
    The domain wall-magnetic tunnel junction (DW-MTJ) is a spintronic device that enables efficient logic circuit design because of its low energy consumption, small size, and non-volatility. Furthermore, the DW-MTJ is one of the few spintronic devices for which a direct cascading mechanism is experimentally demonstrated without any extra buffers; this enables potential design and fabrication of a large-scale DW-MTJ logic system. However, DW-MTJ logic relies on the conversion between electrical signals and magnetic states which is sensitive to process imperfection. Therefore, it is important to analyze the robustness of such DW-MTJ devices to anticipate the system reliability before fabrication. Here we propose a new DW-MTJ model that integrates the impacts of process variation to enable the analysis and optimization of DW-MTJ logic. This will allow circuit and device design that enhances the robustness of DW-MTJ logic and advances the development of energy-efficient spintronic computing systems. 
    more » « less
  4. In this work, we propose a new approximate logarithm multipliers (ALM) based on a novel error compensation scheme. The proposed hardware-efficient ALM, named HEALM, first determines the truncation width for mantissa summation in ALM. Then the error compensation or reduction is performed via a lookup table, which stores reduction factors for different regions of input operands. This is in contrast to an existing approach, in which error reduction is performed independently of the width truncation of mantissa summation. As a result, the new design will lead to more accurate result with both reduced area and power. Furthermore, different from existing approaches which will either introduce resource overheads when doing error improvement or lose accuracy when saving area and power, HEALM can improve accuracy and resource consumption at the same time. Our study shows that 8-bit HEALM can achieve up to 2.92%, 9.30%, 16.08%, 17.61% improvement in mean error, peak error, area, power consumption respectively over REALM, which is the state of art work with the same number of bits truncated. We also propose a single error coefficient mode named HEALM-TA-S, which improves the ALM design with a truncation adder (TA) for mantissa summation. Furthermore, we evaluate the proposed HEALM design in a discrete cosine transformation (DCT) application. The result shows that with different values of k, HEALM-TA can improve the image quality upon the ALM baseline by 7.8 to 17.2dB in average and HEALM-SOA can improve 2.9 to15.8dB in average, respectively. Besides, HEALM-TA and HEALM-SOA outperform all the state of artworks with k=2,3,4 on the image quality. And the single coefficient mode, HEALM-TA-S, can improve the image quality upon the baseline up to 4.1dB in average with extremely low resource consumption 
    more » « less
  5. Vector-matrix multiplication (VMM) is a core operation in many signal and data processing algorithms. Previous work showed that analog multipliers based on nonvolatile memories have superior energy efficiency as compared to digital counterparts at low-to-medium computing precision. In this paper, we propose extremely energy efficient analog mode VMM circuit with digital input/output interface and configurable precision. Similar to some previous work, the computation is performed by gate-coupled circuit utilizing embedded floating gate (FG) memories. The main novelty of our approach is an ultra-low power sensing circuitry, which is designed based on translinear Gilbert cell in topological combination with a floating resistor and a low-gain amplifier. Additionally, the digital-to-analog input conversion is merged with VMM, while current-mode algorithmic analog-to-digital circuit is employed at the circuit backend. Such implementations of conversion and sensing allow for circuit operation entirely in a current domain, resulting in high performance and energy efficiency. For example, post-layout simulation results for 400×400 5-bit VMM circuit designed in 55 nm process with embedded NOR flash memory, show up to 400 MHz operation, 1.68 POps/J energy efficiency, and 39.45 TOps/mm2 computing throughput. Moreover, the circuit is robust against process-voltage-temperature variations, in part due to inclusion of additional FG cells that are utilized for offset compensation. 
    more » « less