skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Case Studies of Configurable Binary Design Library on FPGA
This paper presents a configurable binary design library including fundamental arithmetic circuits like full-adder, full-subtractor, binary multiplier, shifter, and more. The Chisel Hardware Construction Language (HCL) is employed to build the parameterizable designs with different precision including half-word, word, double-word, and quad-word. Chisel HCL is an open-source embedded domain-specific language that inherits the object-oriented and functional programming aspects of Scala for constructing hardware. Experimental results show the same accuracy achieved by our proposed work compared with the Verilog HDL implementations. The hardware cost in terms of slice count, power consumption, and the maximum clock frequency is further estimated. Compared with traditional design intellectual properties (IPs) provided by IP vendors, our proposed work is configurable and expandable to the other arithmetic implementations and projects.  more » « less
Award ID(s):
1928622
PAR ID:
10430804
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE Intl. Symposium on Measurement and Control of Robotics, 2022
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The recent advance in the post-quantum cryptography (PQC) field has gradually shifted from the theory to the implementation of the cryptosystem, especially on the hardware platforms. Following this trend, in this paper, we aim to present efficient implementations of the finite field arithmetic (key component) for the binary Ring-Learning-with-Errors (Ring-LWE) PQC through a novel lookup-table (LUT)-like method. In total, we have carried out four stages of interdependent efforts: (i) an algorithm-hardware co-design driven derivation of the proposed LUT-like method is provided detailedly for the key arithmetic of the BRLWE scheme; (ii) the proposed hardware architecture is then presented along with the internal structural description; (iii) we have also presented a novel hybrid size structure suitable for flexible operation, which is the first report in the literature; (iv) the final implementation and comparison processes have also been given, demonstrating that our proposed structures deliver significant improved performance over the state-of-the-art solutions. The proposed designs are highly efficient and are expected to be employed in many emerging applications. 
    more » « less
  2. null (Ed.)
    We present Fleet, a framework that offers a massively parallel streaming model for FPGAs and is effective in a number of domains well-suited for FPGA acceleration, including parsing, compression, and machine learning. Fleet requires the user to specify RTL for a processing unit that serially processes every input token in a stream, a far simpler task than writing a parallel processing unit. It then takes the user’s processing unit and generates a hardware design with many copies of the unit as well as memory controllers to feed the units with separate streams and drain their outputs. Fleet includes a Chisel-based processing unit language. The language maintains Chisel’s low-level performance control while adding a few productivity features, including automatic handling of ready-valid signaling and a native and automatically pipelined BRAM type. We evaluate Fleet on six different applications, including JSON parsing and integer compression, fitting hundreds of Fleet processing units on the Amazon F1 FPGA and outperforming CPU implementations by over 400× and GPU implementations by over 9× in performance per watt while requiring a similar number of lines of code. 
    more » « less
  3. Modern applications for DSP systems are increasingly constrained by tight area and power requirements. Therefore, it is imperative to analyze effective strategies that work within these requirements. This paper studies the impact of finite word-length arithmetic on the signal to quantization noise ratio (SQNR), power and area for a real-valued serial FFT implementation. An experiment is set up using a hardware description language (HDL) to empirically determine the tradeoffs associated with the following parameters: (i) the input word-length, (ii) the word-length of the rotation coefficients, and (iii) length of the FFT on performance (SQNR), power and area. The results of this paper can be used to make design decisions by careful selection of word-length to achieve a reduction in area and power for an acceptable loss in SQNR. 
    more » « less
  4. Agile hardware design techniques take the best of software engineering methods and apply them to improve hardware design productivity. Agile approaches not only reduce the time to solution, but they can also produce solutions which are better tailored to their target problems. Chisel provides the perfect vehicle to teach these techniques as it allows for the creation of reusable hardware generators. In this work, we outline our experiences creating an agile hardware design course using Chisel, and the lessons learned from teaching it four times. All of the course materials are available as open source. 
    more » « less
  5. Promising for digital signal processing applications, approximate computing has been extensively considered to tradeoff limited accuracy for improvements in other circuit metrics such as area, power, and performance. In this paper, approximate arithmetic circuits are proposed by using emerging nanoscale spintronic devices. Leveraging the intrinsic current-mode thresholding operation of spintronic devices, we initially present a hybrid spin-CMOS majority gate design based on a composite spintronic device structure consisting of a magnetic domain wall motion stripe and a magnetic tunnel junction. We further propose a compact and energy-efficient accuracy-configurable adder design based on the majority gate. Unlike most previous approximate circuit designs that hardwire a constant degree of approximation, this design is adaptive to the inherent resilience in various applications to different degrees of accuracy. Subsequently, we propose two new approximate compressors for utilization in fast multiplier designs. The device-circuit SPICE simulation shows 34.58% and 66% improvement in power consumption, respectively, for the accurate and approximate modes of the accuracy-configurable adder, compared to the recently reported domain wall motion-based full adder design. In addition, the proposed accuracy-configurable adder and approximate compressors can be efficiently utilized in the discrete cosine transform (DCT) as a widely-used digital image processing algorithm. The results indicate that the DCT and inverse DCT (IDCT) using the approximate multiplier achieve ~2x energy saving and 3x speed-up compared to an exactly-designed circuit, while achieving comparable quality in its output result. 
    more » « less