skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Embedding Binary Perceptrons in FPGA to improve Area, Power and Performance
For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture.  more » « less
Award ID(s):
1701241
PAR ID:
10131452
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Computer-Aided Design
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes an alternative FPGA tile struc- ture that consists of three traditional LUTs combined with a new reconfigurable threshold logic cell (TLC). The TLC requires only 7 SRAM cells and can be configured to implement one of several threshold functions. The proposed architecture is implemented in a 28nm FDSOI process, and is evaluated on standard benchmark circuits and several large complex function blocks. The results demonstrate an average reduction of 8.9% in register count, 15.4% in multiplexer count, 7% average reduction in Basic Logic Element (BLE) area, and 8.2% average reduction in BLE power, with a maximum decrease in register count up to 64%, BLE multiplexer count up to 68%, BLE Area up to 51.6% and BLE power up to 61.6% without loss in performance. We also show a reduction of 21% in the area of a tile. 
    more » « less
  2. The HSC-FPGA offers an intriguing feasible architecture for the next generation of configurable fabrics, which allows embracing the advantages of both CMOS and beyond-CMOS technologies without requiring significant modification to the routing structure, programming paradigms, and synthesis tool-chain of the commercial FPGAs. In the HSC-FPGA, the intrinsic characteristics of magnetic random access memory (MRAM)-look-up table (LUT) circuits are used to implement sequential logic, while combinational logic circuits are implemented by static random access memory (SRAM)-LUTs. Fabric-level simulation results for the developed HSC-FPGA show that it can achieve at least 18%, 70%, and 15% reduction in terms of area, standby power, and read power consumption, respectively, for various ISCAS-89 and ITC-99 benchmark circuits compared to conventional SRAM-based FPGAs. The power consumption values can be further decreased by the power-gating allowed by the non-volatility feature of MRAM-LUTs. Moreover, the benefits of increased heterogeneity for reconfigurable computing is extended along realizing probabilistic computing paradigms within a fabric, which is enabled by probabilistic spin logic devices. The cooperating strengths of technology-heterogeneity and heterogeneity in computing paradigm in the proposed HSC-FPGA are leveraged to develop energy-efficient and reliability-aware training and evaluation circuits for deep belief networks with memristive crossbar arrays and p-bit based probabilistic neurons. 
    more » « less
  3. The use of multi-output look-up tables (LUTs) is a widely adopted approach in contemporary commercial field-programmable gate arrays (FPGAs). Larger LUT configurations (e.g., six-input LUTs) can be partitioned into smaller LUTs (e.g., two five-input LUTs, maintaining a total input count of less than six). This capability of generating a second output from a larger LUT is not only crucial for reducing logic cell count and enhancing the utilization efficiency of logic resources—thus conserving area—but also plays a key role in optimizing system-level delays and energy consumption. In this paper, we propose an efficient multi-output LUT mapping technique, incorporating several highly efficient technology mapping algorithms, which focus on optimizing the mapping from an interconnection perspective as alternatives to directly merging smaller LUTs. These algorithms include a side-fanout insertion algorithm, and a runtime multi-output cut generation algorithm. The proposed methods improve mapping efficiency and enhance performance. The benchmarking results demonstrate that the dual-output mapping algorithms achieve LUT area reductions of up to 35% and 6%, compared to the state-of-the-art ABC six-input, single-output LUT mapping technique and previous work focusing on dual-output LUT mapping techniques that optimize cut generation parameters. Moreover, FPGA system-level simulations also show that area, delay, and energy can all be optimized based on this multi-output mapping technique. 
    more » « less
  4. null (Ed.)
    One approach to mitigate side-channel attacks (SCAs) is to use clockless, asynchronous digital logic. To simplify this process, we propose a unique asynchronous FPGA based on a new THx2 programmable threshold cell. At a minimum, FPGAs require a programmable logic cell that can implement a complete set of logic so that it can be connected through the programmable interconnect network to form any digital system. To meet that criteria, we take advantage of CMOS transistors to implement a programmable THx2 threshold cell capable of performing both TH12 and TH22 asynchronous operations. Our complete sixteen transistor FPGA cell includes eight transistors to implement the base THx2 threshold operation, three transistors to switch between the TH12 and TH22 modes, and five memory cell transistors for mode storage. Our unique minimal transistor, programmable THx2 implementation enables formation of a complete set of asynchronous threshold gates and a complete set of standard combinational logic functions. The symmetric nature of the FPGA cell, in regard to the number of transistors (eight NMOS and eight PMOS), makes it ideal for a four row by four column transistor grid with a nearly square, easily array-able layout. It should be noted our THx2 cell is highly compact and suitable for implementing a clockless, asynchronous FPGA. 
    more » « less
  5. Convolutional neural networks (CNNs) have demonstrated remarkable performance in many areas but require significant computation and storage resources. Quantization is an effective method to reduce CNN complexity and implementation. The main research objective is to develop a scalable quantization algorithm for CNN hardware design and model the performance metrics for the purpose of CNN implementation in resource-constrained devices (RCDs) and optimizing layers in deep neural networks (DNNs). The algorithm novelty is based on blending two quantization techniques to perform full model quantization with optimum accuracy, and without additional neurons. The algorithm is applied to a selected CNN model and implemented on an FPGA. Implementing CNN using broad data is not possible due to capacity issues. With the proposed quantization algorithm, we succeeded in implementing the model on the FPGA using 16-, 12-, and 8-bit quantization. Compared to the 16-bit design, the 8-bit design offers a 44% decrease in resource utilization, and achieves power and energy reductions of 41% and 42%, respectively. Models show that trading off one quantization bit yields savings of approximately 5.4K LUTs, 4% logic utilization, 46.9 mW power, and 147 μJ energy. The models were also used to estimate performance metrics for a sample DNN design. 
    more » « less