skip to main content


Title: A Monolithic Hardware Implementation of Kyber: Comparing Apples to Apples in PQC Candidates
With the advent of large-scale quantum computers, factoring and discrete logarithm problems could be solved using the polynomialtime quantum algorithms. To ensure public-key security, a transition to quantum-resistant cryptographic protocols is required. Performance of hardware accelerators targeting different platforms and diverse application goals plays an important role in PQC candidates’ differentiation. Hardware accelerators based on FPGAs and ASICs also provide higher flexibility to create a very low area or ultra-high performance implementations at the high cost of the other. While the hardware/software codesign development of PQC schemes has already received an increasing research effort, a cost analysis of efficient pure hardware implementation is still lacking. On the other hand, since FPGA has various types of hardware resources, evaluating and making the accurate and fair comparison of hardware-based implementations against each other is very challenging. Without a common foundation, apples are compared to oranges. This paper demonstrates a pure hardware architecture for Kyber as one of the finalists in the third round of the NIST post-quantum cryptography standardization process. To enable real, realistic, and comparable evaluations in PQC schemes over hardware platforms, we compare our architecture over the ASIC platform as a common foundation showing that it outperforms the previous works in the literature.  more » « less
Award ID(s):
1801341
PAR ID:
10337497
Author(s) / Creator(s):
Date Published:
Journal Name:
LatinCrypt
Volume:
12912
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. It has been predicted that within the next tenfifteen years, quantum computers will have computational power sufficient to break current public-key cryptography schemes. When that happens, all traditional methods of dealing with the growing computational capabilities of potential attackers, such as increasing key sizes, will be futile. The only viable solution is to develop new standards based on algorithms that are resistant to quantum computer attacks and capable of being executed on traditional computing platforms, such as microprocessors and FPGAs. Leading candidates for new standards include lattice-based post-quantum cryptography (PQC) algorithms. In this paper, we present the results of implementing and benchmarking three lattice-based key encapsulation mechanisms (KEMs) that have progressed to Round 2 of the NIST standardization process. Our implementations are based on a software/hardware codesign approach, which is particularly applicable to the current stage of the NIST PQC standardization process, where the large number and high complexity of the candidates make traditional hardware benchmarking extremely challenging. We propose and justify the choice of a suitable system-on-chip platform and design methodology. The obtained results indicate the potential for very substantial speed-ups vs. purely software implementations, reaching 28x for encapsulation and 20x for decapsulation. 
    more » « less
  2. The recent advancement in quantum technology has initiated a new round of cryptosystem innovation, i.e., the emergence of Post-Quantum Cryptography (PQC). This new class of cryptographic schemes is intended to be mathematically resistant against any known attacks using quantum computers, but, at the same time, be fully implementable using traditional semiconductor technology. The National Institutes of Standards and Technology (NIST) has already started the PQC standardization process, and the initial pool of 69 submissions has been reduced to 26 Round 2 candidates. Echoing the pace of the PQC "revolution," this paper gives a detailed and thorough introduction to recent advances in the hardware implementation of PQC schemes, including challenges, new implementation methods, and novel hardware architectures. Specifically, we have: (i) described the challenges and rewards of implementing PQC in hardware; (ii) presented the novel methodology for the design-space exploration of PQC implementations using high-level synthesis (HLS); (iii) introduced a new underexplored PQC scheme (binary Ring-Learning-with-Errors), as well as its novel hardware implementation for possible lightweight applications. The overall content delivered by this paper could serve multiple purposes: (i) provide useful references for the potential learners and the interested public; (ii) introduce new areas and directions for potential research to the VTS community; (iii) facilitate the PQC standardization process and the exploration of related new ways of implementing cryptography in existing and emerging applications. 
    more » « less
  3. The recent advance in the post-quantum cryptography (PQC) field has gradually shifted from the theory to the implementation of the cryptosystem, especially on the hardware platforms. Following this trend, in this paper, we aim to present efficient implementations of the finite field arithmetic (key component) for the binary Ring-Learning-with-Errors (Ring-LWE) PQC through a novel lookup-table (LUT)-like method. In total, we have carried out four stages of interdependent efforts: (i) an algorithm-hardware co-design driven derivation of the proposed LUT-like method is provided detailedly for the key arithmetic of the BRLWE scheme; (ii) the proposed hardware architecture is then presented along with the internal structural description; (iii) we have also presented a novel hybrid size structure suitable for flexible operation, which is the first report in the literature; (iv) the final implementation and comparison processes have also been given, demonstrating that our proposed structures deliver significant improved performance over the state-of-the-art solutions. The proposed designs are highly efficient and are expected to be employed in many emerging applications. 
    more » « less
  4. When quantum computers become scalable and reliable, they are likely to break all public-key cryptography standards, such as RSA and Elliptic Curve Cryptography. The projected threat of quantum computers has led the U.S. National Institute of Standards and Technology (NIST) to an effort aimed at replacing existing public-key cryptography standards with new quantum-resistant alternatives. In December 2017, 69 candidates were accepted by NIST to Round 1 of the NIST Post-Quantum Cryptography (PQC) standardization process. NTRUEncrypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of its primary operation, polynomial multiplication. In this paper, we investigate speeding up NTRUEncrypt using software/hardware codesign on a Xilinx Zynq UltraScale+ multiprocessor system-on-chip (MPSoC). Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches: traditional Register-Transfer Level (RTL) and High-Level Synthesis (HLS). The remaining operations of NTRUEncrypt are executed in software on the Processing System (PS) of Zynq, using the bare-metal mode. The speed-up of our software/hardware codesigns vs. purely software implementations is determined experimentally and analyzed in the paper. The results are reported for the RTL-based and HLS-based hardware accelerators, and compared to the best available software implementation, included in the NIST submission package. The speed-ups for encryption were 2.4 and 3.9, depending on the selected parameter set. For decryption, the corresponding speed-ups were 4.0 and 6.8. In addition, for the polynomial multiplication operation itself, the speed up was in excess of 75. Our code for the NTRUEncrypt polynomial multiplier accelerator is being made open-source for further evaluation on multiple software/hardware platforms. 
    more » « less
  5. When quantum computers become scalable and reliable, they are likely to break all public-key cryptography standards, such as RSA and Elliptic Curve Cryptography. The projected threat of quantum computers has led the U.S. National Institute of Standards and Technology (NIST) to an effort aimed at replacing existing public-key cryptography standards with new quantum-resistant alternatives. In December 2017, 69 candidates were accepted by NIST to Round 1 of the NIST Post-Quantum Cryptography (PQC) standardization process. NTRUEncrypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of its primary operation, polynomial multiplication. In this paper, we investigate speeding up NTRUEncrypt using software/hardware codesign on a Xilinx Zynq UltraScale+ multiprocessor system-on-chip (MPSoC). Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches: traditional Register-Transfer Level (RTL) and High-Level Synthesis (HLS). The remaining operations of NTRUEncrypt are executed in software on the Processing System (PS) of Zynq, using the bare-metal mode. The speed-up of our software/hardware codesigns vs. purely software implementations is determined experimentally and analyzed in the paper. The results are reported for the RTL-based and HLS-based hardware accelerators, and compared to the best available software implementation, included in the NIST submission package. The speed-ups for encryption were 2.4 and 3.9, depending on the selected parameter set. For decryption, the corresponding speed-ups were 4.0 and 6.8. In addition, for the polynomial multiplication operation itself, the speed up was in excess of 75. Our code for the NTRUEncrypt polynomial multiplier accelerator is being made open-source for further evaluation on multiple software/hardware platforms. 
    more » « less