skip to main content


Title: FPGAs with Reconfigurable Threshold Logic Gates for Improved Performance, Power and Area
This paper proposes an alternative FPGA tile struc- ture that consists of three traditional LUTs combined with a new reconfigurable threshold logic cell (TLC). The TLC requires only 7 SRAM cells and can be configured to implement one of several threshold functions. The proposed architecture is implemented in a 28nm FDSOI process, and is evaluated on standard benchmark circuits and several large complex function blocks. The results demonstrate an average reduction of 8.9% in register count, 15.4% in multiplexer count, 7% average reduction in Basic Logic Element (BLE) area, and 8.2% average reduction in BLE power, with a maximum decrease in register count up to 64%, BLE multiplexer count up to 68%, BLE Area up to 51.6% and BLE power up to 61.6% without loss in performance. We also show a reduction of 21% in the area of a tile.  more » « less
Award ID(s):
1701241
NSF-PAR ID:
10099615
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Field Programmable Logic
Page Range / eLocation ID:
256 to 2563
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture. 
    more » « less
  2. null (Ed.)
    Lightweight cryptography offers viable security solutions for resource constrained Internet of Things (IoT) devices. However, IoT devices have implementation vulnerabilities such as side channel attacks (SCA), where observation of physical phenomena associated with device operations can reveal sensitive internal contents. The U.S. National Institute of Standards and Technology has called for lightweight cryptographic solutions to process authenticated encryption with associated data (AEAD), and is evaluating candidates for suitability in a Lightweight Cryptography (LWC) Standardization Process. Two Round 2 candidate variants, COMET-CHAM and SCHWAEMM, use Addition-Rotation-XOR (ARX) primitives. However, ARX ciphers are known to be costly to protect against certain SCA. In this work we implement side channel protected versions of COMET-CHAM and SCHWAEMM using register transfer level design. Identical protection schemes consisting of a threshold implementation (TI)-protected Kogge-Stone adder are adopted. Resistance to power side channel analysis is verified on an Artix-7 FPGA target device. Implementations comply with the Hardware API for Lightweight Cryptography, and use a custom-designed extension of the Development Package for the Hardware API for Lightweight Cryptography which enables test and evaluation of side channel resistant designs. We compare side channel protection costs of the two candidates against each other, against their unprotected counterparts, and against previous side channel protected AEAD implementations. COMET-CHAM is shown to consume less area and power, while SCHWAEMM has higher throughput and throughput to area ratio, and is more energy efficient. On average, the costs of protecting these ciphers against SCA are 32% more in area and 38% more in power, compared to the average protection costs for a large selection of previously-evaluated ciphers of similar implementation. Our results highlight the costs involved in implementing side channel protected ARX-ciphers, and help to inform NIST LWC late round and final portfolio selections. 
    more » « less
  3. null (Ed.)
    A Muller C-Element is a digital circuit component used in most asynchronous circuits and systems. In Null Convention Logic, the Muller C-Elements make up the subset of THmn threshold gates where the threshold, m, and the input bit- width, n, are equal. This paper presents a new Efficient Muller C- Element implementation, EMC, that is especially suitable for Null Convention Logic applications with high input bit-widths, and it is much faster and smaller than standard implementations. It has a two-transistor switching delay that is independent of the input bit- width, n, and exhibits low noise and static power consumption. It is suitable for all Muller C-Element applications, especially those like Null Convention Logic register feedback circuits that can have large input bit-widths. To reduce static power consumption, it uses active resistors that are only turned “ON” when necessary. Two output stages are presented to implement the required Muller C- Element digital hysteresis: standard, semi-static cross-coupled inverter version, and differential sense-amplifier option. For large values of n, our circuit requires approximately one-half fewer transistors than combining smaller Null Convention Logic THmn semi-static threshold gates. We have successfully simulated up to n = 1024 at a 65 nm node. 
    more » « less
  4. Processors are typically designed in Register Transfer Level (RTL) languages, which give designers low-level control over circuit structure and timing. To achieve good performance, processors are pipelined, with multiple instructions executing concurrently in different parts of the circuit. Thus even though processors implement a fundamentally sequential specification (the instruction set architecture), the implementation is highly concurrent. The interactions of multiple instructions---potentially speculative---can cause incorrect behavior. We present PDL, a novel hardware description language targeted at the construction of pipelined processors. PDL provides one instruction at a time semantics: the first language to enforce that the generated pipelined circuit has the same behavior as a sequential specification. This enforcement facilitates design-space exploration. Adding or removing pipeline stages, moving operations across stages, or otherwise changing pipeline structure normally requires careful analysis of bypass paths and stall logic; with PDL, this analysis is handled by the PDL compiler. At the same time, PDL still offers designers fine-grained control over performance-critical microarchitectural choices such as timing of operations, data forwarding, and speculation. We demonstrate PDL's expressive power and ease of design exploration by implementing several RISC-V cores with differing microarchitectures. Our results show that PDL does not impose significant performance or area overhead compared to a standard HDL. 
    more » « less
  5. Stochastic computing (SC) is a low-cost computational paradigm that has promising applications in digital filter design, image processing, and neural networks. Fundamental to these applications is the weighted addition operation, which is most often implemented by a multiplexer (mux) tree. Mux-based adders have very low area but typically require long bitstreams to reach practical accuracy thresholds when the number of summands is large. In this work, we first identify the main contributors to mux adder error. We then demonstrate with analysis and experiment that two new techniques, precise sampling and full correlation, can target and mitigate these error sources. Implementing these techniques in hardware leads to the design of CeMux (Correlation-enhanced Multiplexer), a stochastic mux adder that is significantly more accurate and uses much less area than traditional weighted adders. We compare CeMux to other SC and hybrid designs for an electrocardiogram filtering case study that employs a large digital filter. One major result is that CeMux is shown to be accurate even for large input sizes. CeMux's higher accuracy leads to a latency reduction of 4× to 16× over other designs. Furthermore, CeMux uses about 35% less area than existing designs, and we demonstrate that a small amount of accuracy can be traded for a further 50% reduction in area. Finally, we compare CeMux to a conventional binary design and we show that CeMux can achieve a 50% to 73% area reduction for similar power and latency as the conventional design but at a slightly higher level of error. 
    more » « less