skip to main content

Title: Embedding Binary Perceptrons in FPGA to improve Area, Power and Performance
For the flexibility of implementing any given Boolean function(s), the FPGA uses re-configurable building blocks called LUTs. The price for this reconfigurability is a large number of registers and multiplexers required to construct the FPGA. While researchers have been working on complex LUT structures to reduce the area and power for several years, most of these implementations come at the cost of performance penalty. This paper demonstrates simultaneous improvement in area, power, and performance in an FPGA by using special logic cells called Threshold Logic Cells (TLCs) (also known as binary perceptrons). The TLCs are capable of implementing a complex threshold function, which if implemented using conventional gates would require several levels of logic gates. The TLCs only require 7 SRAM cells and are significantly faster than the conventional LUTs. The implementation of the proposed FPGA architecture has been done using 28nm FDSOI standard cells and has been evaluated using ISCAS-85, ISCAS-89, and a few large industrial designs. Experiments demonstrate that the proposed architecture can be used to get an average reduction of 18.1% in configuration registers, 18.1% reduction in multiplexer count, 12.3% in Basic Logic Element (BLE) area, 16.3% in BLE power, 5.9% improvement in operating frequency, with a more » slight reduction in track count, routing area and routing power. The improvements are also demonstrated on the physically designed version of the architecture. « less
; ;
Award ID(s):
Publication Date:
Journal Name:
International Conference on Computer-Aided Design
Page Range or eLocation-ID:
1 to 8
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes an alternative FPGA tile struc- ture that consists of three traditional LUTs combined with a new reconfigurable threshold logic cell (TLC). The TLC requires only 7 SRAM cells and can be configured to implement one of several threshold functions. The proposed architecture is implemented in a 28nm FDSOI process, and is evaluated on standard benchmark circuits and several large complex function blocks. The results demonstrate an average reduction of 8.9% in register count, 15.4% in multiplexer count, 7% average reduction in Basic Logic Element (BLE) area, and 8.2% average reduction in BLE power, with a maximum decrease in register count up to 64%, BLE multiplexer count up to 68%, BLE Area up to 51.6% and BLE power up to 61.6% without loss in performance. We also show a reduction of 21% in the area of a tile.
  2. The HSC-FPGA offers an intriguing feasible architecture for the next generation of configurable fabrics, which allows embracing the advantages of both CMOS and beyond-CMOS technologies without requiring significant modification to the routing structure, programming paradigms, and synthesis tool-chain of the commercial FPGAs. In the HSC-FPGA, the intrinsic characteristics of magnetic random access memory (MRAM)-look-up table (LUT) circuits are used to implement sequential logic, while combinational logic circuits are implemented by static random access memory (SRAM)-LUTs. Fabric-level simulation results for the developed HSC-FPGA show that it can achieve at least 18%, 70%, and 15% reduction in terms of area, standby power, and read power consumption, respectively, for various ISCAS-89 and ITC-99 benchmark circuits compared to conventional SRAM-based FPGAs. The power consumption values can be further decreased by the power-gating allowed by the non-volatility feature of MRAM-LUTs. Moreover, the benefits of increased heterogeneity for reconfigurable computing is extended along realizing probabilistic computing paradigms within a fabric, which is enabled by probabilistic spin logic devices. The cooperating strengths of technology-heterogeneity and heterogeneity in computing paradigm in the proposed HSC-FPGA are leveraged to develop energy-efficient and reliability-aware training and evaluation circuits for deep belief networks with memristive crossbar arrays and p-bit based probabilistic neurons.
  3. We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluatemore »Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.« less
  4. Authenticated ciphers are vulnerable to side-channel attacks, including differential power analysis (DPA). Test Vector Leakage Assessment (TVLA) using Welch's t-test has been used to verify improved resistance of block ciphers to DPA after application of countermeasures. However, extension of this methodology to authenticated ciphers is non-trivial, since this requires additional input and output conditions, complex interfaces, and long test vectors interlaced with protocol necessary to describe authenticated cipher operations. In this research we augment an existing side-channel analysis architecture (FOBOS) with TVLA for authenticated ciphers. We use this capability to show that implementations in the Spartan-6 FPGA of the CAESAR Round 3 candidates ACORN, ASCON, CLOC (AES and TWINE), SILC (AES, PRESENT, and LED), JAMBU (AES and SIMON), and Ketje Jr., as well as AES-GCM, are potentially vulnerable to 1st order DPA. We then implement versions of the above ciphers, protected against 1st order DPA, using threshold implementations. TVLA is used to verify improved resistance to 1st order DPA of the protected cipher implementations. Finally, we benchmark unprotected and protected cipher implementations in the Spartan-6 FPGA, and compare the costs of 1st order DPA protection in terms of area, frequency, throughput, throughput-to-area (TP/A) ratio, power, and energy per bit. Ourmore »results show that ACORN is the most energy efficient, has the lowest area (in LUTs), and has the highest TP/A ratio of DPA-resistant implementations. However, Ketje Jr. has the highest throughput.« less
  5. One approach to mitigate side-channel attacks (SCAs) is to use clockless, asynchronous digital logic. To simplify this process, we propose a unique asynchronous FPGA based on a new THx2 programmable threshold cell. At a minimum, FPGAs require a programmable logic cell that can implement a complete set of logic so that it can be connected through the programmable interconnect network to form any digital system. To meet that criteria, we take advantage of CMOS transistors to implement a programmable THx2 threshold cell capable of performing both TH12 and TH22 asynchronous operations. Our complete sixteen transistor FPGA cell includes eight transistors to implement the base THx2 threshold operation, three transistors to switch between the TH12 and TH22 modes, and five memory cell transistors for mode storage. Our unique minimal transistor, programmable THx2 implementation enables formation of a complete set of asynchronous threshold gates and a complete set of standard combinational logic functions. The symmetric nature of the FPGA cell, in regard to the number of transistors (eight NMOS and eight PMOS), makes it ideal for a four row by four column transistor grid with a nearly square, easily array-able layout. It should be noted our THx2 cell is highly compact andmore »suitable for implementing a clockless, asynchronous FPGA.« less