skip to main content


Title: Hardware Implementation of High-Performance Polynomial Multiplication for KEM Saber
Recent advances in quantum computing have initiated a new round of cryptosystem innovation as the existing public-key cryptosystems are proven to be vulnerable to quantum attacks. Several types of cryptographic algorithms have been proposed for possible post-quantum cryptography (PQC) candidates and the lattice-based key encapsulation mechanism (KEM) Saber is one of the most promising algorithms. Noticing that the polynomial multiplication over ring is the key arithmetic operation of KEM Saber, in this paper, we propose a novel strategy for efficient implementation of polynomial multiplication on the hardware platform. First of all, we present the proposed mathematical derivation process for polynomial multiplication. Then, the proposed hardware structure is provided. Finally, field-programmable gate array (FPGA) based implementation results are obtained, and it is shown that the proposed design has better performance than the existing ones. The proposed polynomial multiplication can be further deployed to construct efficient hardware cryptoprocessors for KEM Saber.  more » « less
Award ID(s):
2020625
NSF-PAR ID:
10464950
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE International Symposium on Circuits and Systems (ISCAS)
Page Range / eLocation ID:
1160 to 1164
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The rapid advancement in quantum technology has initiated a new round of post-quantum cryptography (PQC) related exploration. The key encapsulation mechanism (KEM) Saber is an important module lattice-based PQC, which has been selected as one of the PQC finalists in the ongoing National Institute of Standards and Technology (NIST) standardization process. On the other hand, however, efficient hardware implementation of KEM Saber has not been well covered in the literature. In this paper, therefore, we propose a novel cyclic-row oriented processing (CROP) strategy for efficient implementation of the key arithmetic operation of KEM Saber, i.e., the polynomial multiplication. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the main operation of KEM Saber into desired mathematical forms to be further developed into CROP based algorithms, i.e., the basic version and the advanced higher-speed version; (ii) then, we have followed the proposed CROP strategy to innovatively transfer the derived two algorithms into desired polynomial multiplication structures with the help of a series of algorithm-architecture co-implementation techniques; (iii) finally, detailed complexity analysis and implementation results have shown that the proposed polynomial multiplication structures have better area-time complexities than the state-of-the-art solutions. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design, e.g., the basic version has at least less 11.2% area-delay product (ADP) than the best competing one (Cyclone V device). The proposed high-performance polynomial multipliers offer not only efficient operation for output results delivery but also possess low-complexity feature brought by CROP strategy. The outcome of this work is expected to provide useful references for further development and standardization process of KEM Saber. 
    more » « less
  2. The recent research in post-quantum cryptography (PQC) field has gradually switched to efficient implementation of PQC algorithms on hardware platforms. As polynomial multiplication is typically one of the critical operations within lattice-based PQC, its hardware acceleration has drawn significant attention from the research community recently. We propose a high-speed processing strategy to construct a new High-performance Polynomial Multiplication Accelerator (HPMA) for key encapsulation mechanism (KEM) Saber. Firstly, we have given a detailed mathematical derivation to obtain a low-latency processing algorithm for Saber polynomial multiplication. Then, we have innovatively used the derived the proposed algorithm to construct a new structure HPMA for FPGA implementation. Lastly, we have demonstrated the superior performance of the proposed HPMA-Saber by comparing with state-of-the-art works. The proposed design strategy is highly efficient and the obtained results can be useful for the PQC research community. 
    more » « less
  3. The rapid advancement in quantum technology has initiated a new round of exploration of efficient implementation of post-quantum cryptography (PQC) on hardware platforms. Key encapsulation mechanism (KEM) Saber, a module lattice-based PQC, is one of the four encryption scheme finalists in the third-round National Institute of Standards and Technology (NIST) standardization process. In this paper, we propose a novel Toeplitz Matrix-Vector Product (TMVP)-based design strategy to efficiently implement polynomial multiplication (essential arithmetic operation) for KEM Saber. The proposed work consists of three layers of interdependent efforts: (i) first of all, we have formulated the polynomial multiplication of KEM Saber into a desired mathematical form for further developing into the proposed TMVP-based algorithm for high-performance operation; (ii) then, we have followed the proposed TMVP-based algorithm to innovatively transfer the derived algorithm into a unified polynomial multiplication structure (fits all security ranks) with the help of a series of algorithm-to-architecture co-implementation/mapping techniques; (iii) finally, detailed implementation results and complexity analysis have confirmed the efficiency of the proposed TMVP design strategy. Specifically, the field-programmable gate array (FPGA) implementation results show that the proposed design has at least less 30.92% area-delay product (ADP) than the competing ones. 
    more » « less
  4. Following the rapid progress in the post-quantum cryptography (PQC) field that many efforts have been gradually switched to the hardware implementation side, this paper presents a novel systolic accelerator for polynomial multiplication within two lattice-based PQC algorithms, key encapsulation mechanism (KEM) Saber and binary Ring-Learning-with-Errors (BRLWE)-based encryption scheme. Based on the observation that polynomial multiplication over ring is the key arithmetic operation for the two PQC schemes, we have proposed a novel systolic accelerator for the targeted polynomial multiplications (applicable to two PQC schemes). Mathematical formulation is given to illustrate the proposed algorithmic operation for both schemes. Then, the proposed systolic accelerator is presented. Finally, field-programmable gate array (FPGA) implementation results have been provided to confirm the efficiency of the proposed systolic accelerator under two schemes. The proposed accelerator is highly efficient, and the following work may focus on cryptoprocessor design and side-channel attacks. 
    more » « less
  5. CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottle-necks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time (μs) , along with a 66× improvement in throughput performance. 
    more » « less