Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Post-quantum cryptography (PQC) has drawn significant attention from the hardware design research community, especially on field-programmable gate array (FPGA) platforms. In line with this trend, in this paper, we present a novel FPGA-based PQC design work (CHIRP), i.e.,Compact and high-Performance FPGA implementation of unIfied accelerators forRing-Binary-Learning-with-Errors (RBLWE)-basedPQC, a promising lightweight PQC suited for related applications like Internet-of-Things. The proposed accelerators offer flexibility across the available two security levels, thus expanding their application potential. In total, we presented four distinct hardware accelerators tailored to different performance and resource requirements, ranging from resource-constrained devices to high-throughput applications. Our innovation encompasses three key efforts: (i) we derived four optimized algorithms for RBLWE-ENC’s unified operation (covering the available two security levels), allowing flexible switching of security sizes while boosting calculations; (ii) we then presented the four novel accelerators (CHIRP) targeting FPGA platforms, featuring dedicated hardware structures; (iii) we finally conducted a comprehensive evaluation to validate the efficiency of the proposed accelerators on various FPGA devices. Compared to the existing unified design, the proposed accelerator demonstrated up to 91.4% reduction in area-delay product (ADP) on the Straix-V device. Even when compared with the state-of-the-art single security designs, the proposed accelerator (best version) obtains much better resource usage and ADP performance while unified operation (flexibly switching between two security levels) is considered on both AMD-Xilinx and Intel devices. We anticipate the findings of this research will foster advancements in FPGA implementation techniques for lightweight PQC development.more » « lessFree, publicly-accessible full text available January 28, 2026
-
Polynomial multiplication is a key component in many post-quantum cryptography and homomorphic encryption schemes. One recurring variation, ternary polynomial multiplication over ring Zq/(xn+1) where one input polynomial has ternary coefficients {−1,0,1} and the other has large integer coefficients {0, q−1}, has recently drawn significant attention from various communities. Following this trend, this paper presents a novel SCALable and area-Efficient Systolic (SCALES) accelerator for ternary polynomial multiplication. In total, we have carried out three layers of coherent interdependent efforts. First, we have rigorously derived a novel block-processing strategy and algorithm based on the schoolbook method for polynomial multiplication. Then, we have innovatively implemented the proposed algorithm as the SCALES accelerator with the help of a number of field-programmable gate array (FPGA)-oriented optimization techniques. Lastly, we have conducted a thorough implementation analysis to showcase the efficiency of the proposed accelerator. The comparison demonstrated that the SCALES accelerator has at least 19.0% and 23.8% less equivalent area-time product (eATP) than the state-of-the-art designs. We hope this work can stimulate continued research in the field.more » « less
-
The recently announced National Institute of Standards and Technology (NIST) Post-quantum cryptography (PQC) third-round standardization process has released its candidates to be standardized and FALCON is one of them. On the other hand, however, very few hardware implementation works for FALCON have been released due to its very complicated computation procedure and intensive complexity. With this background, in this paper, we propose an efficient hardware structure to implement residue numeral system (RNS) decomposition within NTRUSolve (a key arithmetic component for key generation of FALCON). In total, we have proposed three stages of coherent interdependent efforts to finish the proposed work. First, we have identified the necessary algorithmic operation related to RNS decomposition. Then, we have innovatively designed a hardware structure to realize these algorithms. Finally, field-programmable gate array (FPGA)-based implementation has been carried out to verify the superior performance of the proposed hardware structure. For instance, the proposed hardware design involves at least 3.91x faster operational time than the software implementation. To the authors’ best knowledge, this is the first paper about the hardware acceleration of RNS decomposition for FALCON, and we hope the outcome of this work will facilitate the research in this area.more » « less
-
Post-quantum cryptography (PQC) has drawn significant attention from various communities recently and one of the recent advances is the hardware acceleration of PQC algorithms. While Hamming Quasi-Cyclic (HQC) is one of the recently announced National Institute of Standards and Technology (NIST) fourth-round PQC standardization candidates, very few related hardware implementation works have been reported, particularly lacking solid works on important components such as the sampler. As a fixed-weight sparse vector sampler with constant-time operation is critical to the hardware HQC accelerator, in this paper, we present a novel hardware-implemented LOw-latency and ConStant-timing fixed-weight sampler (LOCS). In total, we have proposed three stages of efforts. First of all, a new algorithm for efficient realization of the fixed-weight sparse vector generation based on Fisher-Yates shuffle algorithm is proposed. Then, we have innovatively designed the algorithm into a new hardware sampler: LOCS. Finally, we have conducted a thorough comparison to showcase the efficiency of the proposed sampler, e.g., the proposed LOCS involves 66.7% less latency time than the state-of-the-art design (n=17,669) while remaining constant-time operation. To the authors' best knowledge, this is the first hardware-implemented pure constant-time (no failure probability) fixed-weight sampler for HQC.more » « less
-
Post-quantum cryptography (PQC) has gained sub-stantial attention from various communities recently. Along with the ongoing National Institute of Standards and Technology (NIST) PQC standardization process that targets the general-purpose PQC algorithms, the research community is also looking for efficient lightweight PQC schemes. Among this direction of efforts, Ring-Binary-Learning-with-Errors (RBLWE)-based encryption scheme (RBLWE-ENC) is regarded as a promising lightweight PQC fitting Internet-of-Things (IoT) and edge computing applications. As hardware implementation for PQC algorithms has become one of the major advances in the field, in this paper, we follow this trend to present an efficient implementation of RBLWE-ENC lightweight accelerator on the field-programmable gate array (FPGA) platform. Overall, we have demonstrated three coherent interdependent stages of efforts: (i) we have presented detailed derivation processes to formulate the proposed algorithmic operation; (ii) we have then implemented the proposed algorithm into a desired hardware accelerator; and (iii) we provided thorough complexity analysis and comparison to showcase the superior performance of the proposed accelerator over the state-of-the-art designs, e.g., the proposed accelerator with v=8 has at least 66.67% less area-time complexities than the existing ones (Virtex-7 FPGA). We hope the outcome of this work can facilitate lightweight PQC development.more » « less
-
The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones.more » « less
An official website of the United States government
