skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FPGAs with Reconfigurable Threshold Logic Gates for Improved Performance, Power and Area
This paper proposes an alternative FPGA tile struc- ture that consists of three traditional LUTs combined with a new reconfigurable threshold logic cell (TLC). The TLC requires only 7 SRAM cells and can be configured to implement one of several threshold functions. The proposed architecture is implemented in a 28nm FDSOI process, and is evaluated on standard benchmark circuits and several large complex function blocks. The results demonstrate an average reduction of 8.9% in register count, 15.4% in multiplexer count, 7% average reduction in Basic Logic Element (BLE) area, and 8.2% average reduction in BLE power, with a maximum decrease in register count up to 64%, BLE multiplexer count up to 68%, BLE Area up to 51.6% and BLE power up to 61.6% without loss in performance. We also show a reduction of 21% in the area of a tile.  more » « less
Award ID(s):
1701241
PAR ID:
10099615
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Field Programmable Logic
Page Range / eLocation ID:
256 to 2563
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture. 
    more » « less
  2. null (Ed.)
    A Muller C-Element is a digital circuit component used in most asynchronous circuits and systems. In Null Convention Logic, the Muller C-Elements make up the subset of THmn threshold gates where the threshold, m, and the input bit- width, n, are equal. This paper presents a new Efficient Muller C- Element implementation, EMC, that is especially suitable for Null Convention Logic applications with high input bit-widths, and it is much faster and smaller than standard implementations. It has a two-transistor switching delay that is independent of the input bit- width, n, and exhibits low noise and static power consumption. It is suitable for all Muller C-Element applications, especially those like Null Convention Logic register feedback circuits that can have large input bit-widths. To reduce static power consumption, it uses active resistors that are only turned “ON” when necessary. Two output stages are presented to implement the required Muller C- Element digital hysteresis: standard, semi-static cross-coupled inverter version, and differential sense-amplifier option. For large values of n, our circuit requires approximately one-half fewer transistors than combining smaller Null Convention Logic THmn semi-static threshold gates. We have successfully simulated up to n = 1024 at a 65 nm node. 
    more » « less
  3. In this paper, we describe a design of a mixed-signal circuit for an binary neuron (a.k.a perceptron, threshold logic gate) and a methodology for automatically embedding such cells in ASICs. The binary neuron, referred to as an FTL (flash threshold logic) uses floating gate or flash transistors whose threshold voltages serve as a proxy for the weights of the neuron. Algorithms for mapping the weights to the flash transistor threshold voltages are presented. The threshold voltages are determined to maximize both the robustness of the cell and its speed. The performance, power, and area of a single FTL cell are shown to be significantly smaller (79.4%), consume less power (61.6%), and operate faster (40.3%) compared to conventional CMOS logic equivalents. Also included are the architecture and the algorithms to program the flash devices of an FTL. The FTL cells are implemented as standard cells, and are designed to allow commercial synthesis and P&R tools to automatically use them in synthesis of ASICs. Substantial reductions in area and power without sacrificing performance are demonstrated on several ASIC benchmarks by the automatic embedding of FTL cells. The paper also demonstrates how FTL cells can be used for fixing timing errors after fabrication. 
    more » « less
  4. null (Ed.)
    Lightweight cryptography offers viable security solutions for resource constrained Internet of Things (IoT) devices. However, IoT devices have implementation vulnerabilities such as side channel attacks (SCA), where observation of physical phenomena associated with device operations can reveal sensitive internal contents. The U.S. National Institute of Standards and Technology has called for lightweight cryptographic solutions to process authenticated encryption with associated data (AEAD), and is evaluating candidates for suitability in a Lightweight Cryptography (LWC) Standardization Process. Two Round 2 candidate variants, COMET-CHAM and SCHWAEMM, use Addition-Rotation-XOR (ARX) primitives. However, ARX ciphers are known to be costly to protect against certain SCA. In this work we implement side channel protected versions of COMET-CHAM and SCHWAEMM using register transfer level design. Identical protection schemes consisting of a threshold implementation (TI)-protected Kogge-Stone adder are adopted. Resistance to power side channel analysis is verified on an Artix-7 FPGA target device. Implementations comply with the Hardware API for Lightweight Cryptography, and use a custom-designed extension of the Development Package for the Hardware API for Lightweight Cryptography which enables test and evaluation of side channel resistant designs. We compare side channel protection costs of the two candidates against each other, against their unprotected counterparts, and against previous side channel protected AEAD implementations. COMET-CHAM is shown to consume less area and power, while SCHWAEMM has higher throughput and throughput to area ratio, and is more energy efficient. On average, the costs of protecting these ciphers against SCA are 32% more in area and 38% more in power, compared to the average protection costs for a large selection of previously-evaluated ciphers of similar implementation. Our results highlight the costs involved in implementing side channel protected ARX-ciphers, and help to inform NIST LWC late round and final portfolio selections. 
    more » « less
  5. The use of multi-output look-up tables (LUTs) is a widely adopted approach in contemporary commercial field-programmable gate arrays (FPGAs). Larger LUT configurations (e.g., six-input LUTs) can be partitioned into smaller LUTs (e.g., two five-input LUTs, maintaining a total input count of less than six). This capability of generating a second output from a larger LUT is not only crucial for reducing logic cell count and enhancing the utilization efficiency of logic resources—thus conserving area—but also plays a key role in optimizing system-level delays and energy consumption. In this paper, we propose an efficient multi-output LUT mapping technique, incorporating several highly efficient technology mapping algorithms, which focus on optimizing the mapping from an interconnection perspective as alternatives to directly merging smaller LUTs. These algorithms include a side-fanout insertion algorithm, and a runtime multi-output cut generation algorithm. The proposed methods improve mapping efficiency and enhance performance. The benchmarking results demonstrate that the dual-output mapping algorithms achieve LUT area reductions of up to 35% and 6%, compared to the state-of-the-art ABC six-input, single-output LUT mapping technique and previous work focusing on dual-output LUT mapping techniques that optimize cut generation parameters. Moreover, FPGA system-level simulations also show that area, delay, and energy can all be optimized based on this multi-output mapping technique. 
    more » « less