Post-quantum cryptography (PQC) has drawn significant attention from the hardware design research community, especially on field-programmable gate array (FPGA) platforms. In line with this trend, in this paper, we present a novel FPGA-based PQC design work (CHIRP), i.e.,Compact and high-Performance FPGA implementation of unIfied accelerators forRing-Binary-Learning-with-Errors (RBLWE)-basedPQC, a promising lightweight PQC suited for related applications like Internet-of-Things. The proposed accelerators offer flexibility across the available two security levels, thus expanding their application potential. In total, we presented four distinct hardware accelerators tailored to different performance and resource requirements, ranging from resource-constrained devices to high-throughput applications. Our innovation encompasses three key efforts: (i) we derived four optimized algorithms for RBLWE-ENC’s unified operation (covering the available two security levels), allowing flexible switching of security sizes while boosting calculations; (ii) we then presented the four novel accelerators (CHIRP) targeting FPGA platforms, featuring dedicated hardware structures; (iii) we finally conducted a comprehensive evaluation to validate the efficiency of the proposed accelerators on various FPGA devices. Compared to the existing unified design, the proposed accelerator demonstrated up to 91.4% reduction in area-delay product (ADP) on the Straix-V device. Even when compared with the state-of-the-art single security designs, the proposed accelerator (best version) obtains much better resource usage and ADP performance while unified operation (flexibly switching between two security levels) is considered on both AMD-Xilinx and Intel devices. We anticipate the findings of this research will foster advancements in FPGA implementation techniques for lightweight PQC development.
more »
« less
Ferroelectric FET-based context-switching FPGA enabling dynamic reconfiguration for adaptive deep learning machines
Field programmable gate array (FPGA) is widely used in the acceleration of deep learning applications because of its reconfigurability, flexibility, and fast time-to-market. However, conventional FPGA suffers from the trade-off between chip area and reconfiguration latency, making efficient FPGA accelerations that require switching between multiple configurations still elusive. Here, we propose a ferroelectric field-effect transistor (FeFET)–based context-switching FPGA supporting dynamic reconfiguration to break this trade-off, enabling loading of arbitrary configuration without interrupting the active configuration execution. Leveraging the intrinsic structure and nonvolatility of FeFETs, compact FPGA primitives are proposed and experimentally verified. The evaluation results show our design shows a 63.0%/74.7% reduction in a look-up table (LUT)/connection block (CB) area and 82.7%/53.6% reduction in CB/switch box power consumption with a minimal penalty in the critical path delay (9.6%). Besides, our design yields significant time savings by 78.7 and 20.3% on average for context-switching and dynamic reconfiguration applications, respectively.
more »
« less
- Award ID(s):
- 2008365
- PAR ID:
- 10508375
- Publisher / Repository:
- Science Advances
- Date Published:
- Journal Name:
- Science Advances
- Volume:
- 10
- Issue:
- 3
- ISSN:
- 2375-2548
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Amber is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for acceleration of dense linear algebra applications, such as machine learning (ML), image processing, and computer vision. It is designed using an agile accelerator-compiler co-design flow; the compiler updates automatically with hardware changes, enabling continuous application-level evaluation of the hardware-software system. To increase hardware utilization and minimize reconfigurability overhead, Amber features the following: 1) dynamic partial reconfiguration (DPR) of the CGRA for higher resource utilization by allowing fast switching between applications and partitioning resources between simultaneous applications; 2) streaming memory controllers supporting affine access patterns for efficient mapping of dense linear algebra; and 3) low-overhead transcendental and complex arithmetic operations. The physical design of Amber features a unique clock distribution method and timing methodology to efficiently layout its hierarchical and tile-based design. Amber achieves a peak energy efficiency of 538 INT16 GOPS/W and 483 BFloat16 GFLOPS/W. Compared with a CPU, a GPU, and a field-programmable gate array (FPGA), Amber has up to 3902x, 152x, and 107x better energy-delay product (EDP), respectively.more » « less
-
Thermoelectric generation (TEG) has increasingly drawn attention for being environmentally friendly. A few researches have focused on improving TEG efficiency at system level on vehicle radiators. The most recent reconfiguration algorithm shows improvement on performance but suffers from major drawback on computational time and energy overhead, and non-scalability in terms of array size and processing frequency. In this paper, we propose a novel TEG array reconfiguration algorithm that determines near-optimal configuration with an acceptable computational time. More precisely, with O(N) time complexity, our prediction-based fast TEG reconfiguration algorithm enables all modules to work at or near their maximum power points (MPP). Additionally, we incorporate prediction methods to further reduce the runtime and switching overhead during the reconfiguration process. Experimental results present 30% performance improvement, almost 100 χ reduction on switching overhead and 13 χ enhancement on computational speed compared to the baseline and prior work. The scalability of our algorithm makes it applicable to larger scale systems such as industrial boilers and heat exchangers.more » « less
-
null (Ed.)This paper aims at reducing computation for Retinanet, an mAP-30-tier network, to facilitate its practical deployment on edge devices for providing IoT-based object detection services. We first validate RetinaNet has the best FLOP-mAP trade-off among all mAP-30-tier network. Then, we propose a light-weight RetinaNet structure with effective computation- accuracy trade-off by only reducing FLOPs in computationally intensive layers. Compared with the most common way of trading off computation with accuracy-input image scaling, the proposed solution shows a consistently better FLOPs-mAP trade-off curve. Light-weight RetinaNet achieves a 0.3% mAP improvement at 1.8x FLOPs reduction point over the original RetinaNet, and gains 1.8x more energy-efficiency on an Intel Arria 10 FPGA accelerator in the context of edge computing. The proposed method potentially can help a wide range of the object detection applications to move closer to a preferred corner for a better runtime and accuracy, while enjoys more energy-efficient inference at the edge.more » « less
-
We describe a fast, abstract method for reverse engineering (RE) field programmable gate array (FPGA) look-up-tables (LUTs). Our method has direct applications to hardware (HW) metering and FPGA fingerprinting, and our approach allows easy portability and application to most L UT based FPGAs. Unlike conventional RE methodologies that rely on vendor specific code (like Xilinx XDL), tools, configuration files, components, etc., our methodology is not dependent on any specific FPGA or FPGA computer aided design (CAD) tool. We use generic hardware description language (HDL) code based on specially connected CASE statements to program the L UTs on a target FPGA. Our specially connected CASE statements allow us to guide placement of L UT functions on successive synthesis runs. This enables us to quickly determine which bits in the FPGA 's configuration file match to FPGA L UT bits. After we know which bits are L UT bits, we can go further and match specific LUT bits to specific bits in the configuration file, thereby creating a one-to-one mapping between every L UT memory cell and its matching bit in the configuration file. In this paper we present our CASE statement functions for performing one-to-one mapping of all FPGA L UT memory cell bits to specific configuration file bits. We have successfully applied our methods to several 7000 series Xilinx and Intel (Altera) FPGAs.more » « less
An official website of the United States government

