skip to main content


Title: Case Studies of Configurable Binary Design Library on FPGA
This paper presents a configurable binary design library including fundamental arithmetic circuits like full-adder, full-subtractor, binary multiplier, shifter, and more. The Chisel Hardware Construction Language (HCL) is employed to build the parameterizable designs with different precision including half-word, word, double-word, and quad-word. Chisel HCL is an open-source embedded domain-specific language that inherits the object-oriented and functional programming aspects of Scala for constructing hardware. Experimental results show the same accuracy achieved by our proposed work compared with the Verilog HDL implementations. The hardware cost in terms of slice count, power consumption, and the maximum clock frequency is further estimated. Compared with traditional design intellectual properties (IPs) provided by IP vendors, our proposed work is configurable and expandable to the other arithmetic implementations and projects.  more » « less
Award ID(s):
1928622
NSF-PAR ID:
10430804
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Intl. Symposium on Measurement and Control of Robotics, 2022
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The recent advance in the post-quantum cryptography (PQC) field has gradually shifted from the theory to the implementation of the cryptosystem, especially on the hardware platforms. Following this trend, in this paper, we aim to present efficient implementations of the finite field arithmetic (key component) for the binary Ring-Learning-with-Errors (Ring-LWE) PQC through a novel lookup-table (LUT)-like method. In total, we have carried out four stages of interdependent efforts: (i) an algorithm-hardware co-design driven derivation of the proposed LUT-like method is provided detailedly for the key arithmetic of the BRLWE scheme; (ii) the proposed hardware architecture is then presented along with the internal structural description; (iii) we have also presented a novel hybrid size structure suitable for flexible operation, which is the first report in the literature; (iv) the final implementation and comparison processes have also been given, demonstrating that our proposed structures deliver significant improved performance over the state-of-the-art solutions. The proposed designs are highly efficient and are expected to be employed in many emerging applications. 
    more » « less
  2. Promising for digital signal processing applications, approximate computing has been extensively considered to tradeoff limited accuracy for improvements in other circuit metrics such as area, power, and performance. In this paper, approximate arithmetic circuits are proposed by using emerging nanoscale spintronic devices. Leveraging the intrinsic current-mode thresholding operation of spintronic devices, we initially present a hybrid spin-CMOS majority gate design based on a composite spintronic device structure consisting of a magnetic domain wall motion stripe and a magnetic tunnel junction. We further propose a compact and energy-efficient accuracy-configurable adder design based on the majority gate. Unlike most previous approximate circuit designs that hardwire a constant degree of approximation, this design is adaptive to the inherent resilience in various applications to different degrees of accuracy. Subsequently, we propose two new approximate compressors for utilization in fast multiplier designs. The device-circuit SPICE simulation shows 34.58% and 66% improvement in power consumption, respectively, for the accurate and approximate modes of the accuracy-configurable adder, compared to the recently reported domain wall motion-based full adder design. In addition, the proposed accuracy-configurable adder and approximate compressors can be efficiently utilized in the discrete cosine transform (DCT) as a widely-used digital image processing algorithm. The results indicate that the DCT and inverse DCT (IDCT) using the approximate multiplier achieve ~2x energy saving and 3x speed-up compared to an exactly-designed circuit, while achieving comparable quality in its output result. 
    more » « less
  3. null (Ed.)
    Multiply-accumulate (MAC) operations are common in data processing and machine learning but costly in terms of hardware usage. Stochastic Computing (SC) is a promising approach for low-cost hardware design of complex arithmetic operations such as multiplication. Computing with deterministic unary bit-streams (defined as bit-streams with all 1s grouped together at the beginning or end of a bit-stream) has been recently suggested to improve the accuracy of SC. Conventionally, SC designs use multiplexer (MUX) units or OR gates to accumulate data in the stochastic domain. MUX-based addition suffers from scaling of data and OR-based addition from inaccuracy. This work proposes a novel technique for MAC operation on unary bit-streamsthat allows exact, non-scaled addition of multiplication results. By introducing a relative delay between the products, we control correlation between bit-streams and eliminate OR-based addition error. We evaluate the accuracy of the proposed technique compared to the state-of-the-art MAC designs. After quantization, the proposed technique demonstrates at least 37% and up to 100% decrease of the mean absolute error for uniformly distributed random input values, compared to traditional OR-based MAC designs. Further, we demonstrate that the proposed technique is practical and evaluate area, power and energy of three possible implementations. 
    more » « less
  4. Modern applications for DSP systems are increasingly constrained by tight area and power requirements. Therefore, it is imperative to analyze effective strategies that work within these requirements. This paper studies the impact of finite word-length arithmetic on the signal to quantization noise ratio (SQNR), power and area for a real-valued serial FFT implementation. An experiment is set up using a hardware description language (HDL) to empirically determine the tradeoffs associated with the following parameters: (i) the input word-length, (ii) the word-length of the rotation coefficients, and (iii) length of the FFT on performance (SQNR), power and area. The results of this paper can be used to make design decisions by careful selection of word-length to achieve a reduction in area and power for an acceptable loss in SQNR. 
    more » « less
  5. A virtual firewall based on Network Function Virtualization (NFV) with Software Defined Networking (SDN) provides high scalability and flexibility for low-cost monitoring of legacy networks by dynamically deploying virtual network appliances rather than traditional hardware-based appliances. However, full utilization of virtual firewalls requires efficient management of computer virtualization resources and on-demand placement of virtual firewalls by steering traffic to the correct routing path using an SDN controller. In this paper, we design P4Guard, a software-based configurable firewall based on a high-level domain-specific language to specify packet processing logic using P4. P4Guard is a protocol-independent and platform-agnostic software-based firewall that can be incorporated into software switches that is highly usable and deployable. We evaluate the efficiency of P4Guard in processing traffic, compared to our previous virtual firewall in NFV. 
    more » « less