skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Generalized Residue Number System Design Approach for Ultralow-Power Arithmetic Circuits Based on Deterministic Bit-Streams
The peak power consumption has become an important concern in the hardware design process of some of today’s applications, such as energy harvesting (EH) and bio-implantable (BI) electronic devices. The limited peak harvested power in EH devices and heating concerns in BI devices are the main reasons for power control’s importance in these devices. This article proposes a generalized design approach for ultralow-power arithmetic circuits. The proposed circuits are based on residue number system (RNS) combined with deterministic bit-streams. The resulting circuits can be used in systems with a restricted power budget. We suggest several approaches to design generic hardware-efficient adders, multipliers, multiply-accumulate (MAC) unit, forward converters (FCs), and reverse converters (RCs). Using the proposed approach, designing these components for any moduli of the RNS can be performed through simple bit-width adjustments in the circuits. The synthesis results show that the proposed adder achieves, on average, 69% and 2% lower area compared to the bit-serial and a state-of-the-art RNS adder, respectively. Furthermore, the proposed multiplier outperforms the bit-serial, interleaved, and a state-of-the-art design for multiplying RNS numbers by, on average, 57%, 60%, and 77% in terms of power consumption, respectively. The efficiency of our approach is shown via two essential applications, digital signal processing, and machine learning. We implement an FFT engine using the proposed method. Compared to prior RNS implementations, our design achieves 47% lower power consumption. We also implement a CNN accelerator’s processing element (PE) with the proposed computation elements. Our design provides considerable speedup and lower power consumption compared to a state-of-the-art ultralower-power design.  more » « less
Award ID(s):
2019511
PAR ID:
10529822
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume:
42
Issue:
11
ISSN:
0278-0070
Page Range / eLocation ID:
3787 to 3800
Subject(s) / Keyword(s):
Addition, forward conversion, low power, multiplication, residue number system (RNS), reverse conversion
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Using binarized neural network (BNN) as an alternative to the conventional convolutional neural network is a promising candidate to answer the demand of using human brain-inspired in applications with limited hardware and power resources, such as biomedical devices, IoT edge sensors, and other battery-operated devices. Using nonvolatile memory elements like MTJ devices in a LiM-based architecture can eliminate the need to access and use external memory which can significantly reduce the power consumption and area overhead. In addition, by using adiabatic-based designs, a significant part of the consumed power can be recovered to the power source which leads to a huge reduction in power consumption which is vital in applications with limited power and hardware resources. In this paper by using nonvolatile MTJ devices in a LiM architecture and using adiabatic-based circuits, an XNOR/XOR synapse and neuron is proposed. The proposed design offers 97% improvement in comparison with its state-of-the-art counterparts in case of power consumption. Also, it achieves at least 7% lower area compared to other counterparts which makes the proposed design a promising candidate for hardware implementation of BNNs. 
    more » « less
  2. Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs. 
    more » « less
  3. In this work, we propose a new approximate logarithm multipliers (ALM) based on a novel error compensation scheme. The proposed hardware-efficient ALM, named HEALM, first determines the truncation width for mantissa summation in ALM. Then the error compensation or reduction is performed via a lookup table, which stores reduction factors for different regions of input operands. This is in contrast to an existing approach, in which error reduction is performed independently of the width truncation of mantissa summation. As a result, the new design will lead to more accurate result with both reduced area and power. Furthermore, different from existing approaches which will either introduce resource overheads when doing error improvement or lose accuracy when saving area and power, HEALM can improve accuracy and resource consumption at the same time. Our study shows that 8-bit HEALM can achieve up to 2.92%, 9.30%, 16.08%, 17.61% improvement in mean error, peak error, area, power consumption respectively over REALM, which is the state of art work with the same number of bits truncated. We also propose a single error coefficient mode named HEALM-TA-S, which improves the ALM design with a truncation adder (TA) for mantissa summation. Furthermore, we evaluate the proposed HEALM design in a discrete cosine transformation (DCT) application. The result shows that with different values of k, HEALM-TA can improve the image quality upon the ALM baseline by 7.8 to 17.2dB in average and HEALM-SOA can improve 2.9 to15.8dB in average, respectively. Besides, HEALM-TA and HEALM-SOA outperform all the state of artworks with k=2,3,4 on the image quality. And the single coefficient mode, HEALM-TA-S, can improve the image quality upon the baseline up to 4.1dB in average with extremely low resource consumption 
    more » « less
  4. Modern applications such as the Internet of Things (IoT) devices, AI, and automotive applications widely use field-programmable gate arrays (FPGAs). However, many of these applications have limited power resources. Also, the existing FPGAs are vulnerable to side-channel attacks (SCAs) such as correlation-based power analysis (CPA) attacks. Therefore, designing low-power, CPA-resistant, and secure-by-design FPGA is required. In this article, two low-power and CPA-resistant hybrid CMOS/magnetic tunnel junction (MTJ) logic-in-memory-based configurable logic blocks (CLBs) have been proposed and compared to a state-of-the-art counterpart. The first proposed design is single output, and the second one is multioutput. The simulation results show that compared to the state-of-the-art secure CLB counterpart [secured CLB (sCLB) by Zooker et al. (2020)], the proposed CLB designs have 42% and 33% lower delay, 85% and 18% lower power consumption, and 86% and 63% fewer equivalent transistors. To implement one round of the PRESENT algorithm, the first and second designs have 85% and 77% fewer transistors, 42% and 33% lower delay, and 86% and 50% lower power consumption compared to their silicon-proven secure counterpart. Also, to implement convolution layers of binarized neural network (BNN), compared to this counterpart, the first and second proposed designs have 85% and 90% fewer equivalent transistors, 42% and 33% lower delay, and 86% and 79% lower power consumption. Also, the resiliency of the proposed designs against power analysis attacks has been investigated by exhaustive simulations and performing CPA attacks on PRESENT and Advanced Encryption Standard (AES) SBOX. Also, this resiliency has been investigated for different tunnel magnetoresistance ratios (TMRs) and supply voltages. 
    more » « less
  5. This paper presents a configurable binary design library including fundamental arithmetic circuits like full-adder, full-subtractor, binary multiplier, shifter, and more. The Chisel Hardware Construction Language (HCL) is employed to build the parameterizable designs with different precision including half-word, word, double-word, and quad-word. Chisel HCL is an open-source embedded domain-specific language that inherits the object-oriented and functional programming aspects of Scala for constructing hardware. Experimental results show the same accuracy achieved by our proposed work compared with the Verilog HDL implementations. The hardware cost in terms of slice count, power consumption, and the maximum clock frequency is further estimated. Compared with traditional design intellectual properties (IPs) provided by IP vendors, our proposed work is configurable and expandable to the other arithmetic implementations and projects. 
    more » « less