skip to main content


Title: Implementation and Benchmarking of Round 2 Candidates in the NIST Post-Quantum Cryptography Standardization Process Using Hardware and Software/Hardware Co-design Approaches
Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as speed, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. A large number of candidates at the early stages of the standardization process makes the accurate and fair comparison very challenging. Nevertheless, in all major past cryptographic standardization efforts, future winners were identified quite early in the evaluation process and held their lead until the standard was selected. Additionally, identifying some candidates as either inherently slow or costly in hardware helped to eliminate a subset of candidates, saving countless hours of cryptanalysis. Finally, early implementations provided a baseline for future design space explorations, paving a way to more comprehensive and fairer benchmarking at the later stages of a given cryptographic competition. In this paper, we first summarize, compare, and analyze results reported by other groups until mid-May 2020, i.e., until the end of Round 2 of the NIST PQC process. We then outline our own methodology for implementing and benchmarking PQC candidates using both hardware and software/hardware co-design approaches. We apply our hardware approach to 6 lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing 4 NIST PQC submissions. We then apply a software-hardware co-design approach to 12 lattice-based CCA-secure KEMs, representing 8 Round 2 submissions. We hope that, combined with results reported by other groups, our study will provide NIST with helpful information regarding the relative performance of a significant subset of Round 2 PQC candidates, assuming that at least their major operations, and possibly the entire algorithms, are off-loaded to hardware.  more » « less
Award ID(s):
1801512
NSF-PAR ID:
10281391
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Cryptology ePrint Archive: Report 2020/795
Page Range / eLocation ID:
1 - 86
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Performance in hardware has typically played a major role in differentiating among leading candidates in cryptographic standardization efforts. Winners of two past NIST cryptographic contests (Rijndael in case of AES and Keccak in case of SHA-3) were ranked consistently among the two fastest candidates when implemented using FPGAs and ASICs. Hardware implementations of cryptographic operations may quite easily outperform software implementations for at least a subset of major performance metrics, such as speed, power consumption, and energy usage, as well as in terms of security against physical attacks, including side-channel analysis. Using hardware also permits much higher flexibility in trading one subset of these properties for another. A large number of candidates at the early stages of the standardization process makes the accurate and fair comparison very challenging. Nevertheless, in all major past cryptographic standardization efforts, future winners were identified quite early in the evaluation process and held their lead until the standard was selected. Additionally, identifying some candidates as either inherently slow or costly in hardware helped to eliminate a subset of candidates, saving countless hours of cryptanalysis. Finally, early implementations provided a baseline for future design space explorations, paving a way to more comprehensive and fairer benchmarking at the later stages of a given cryptographic competition. In this paper, we first summarize, compare, and analyze results reported by other groups until mid-May 2020, i.e., until the end of Round 2 of the NIST PQC process. We then outline our own methodology for implementing and benchmarking PQC candidates using both hardware and software/hardware co-design approaches. We apply our hardware approach to 6 lattice-based CCA-secure Key Encapsulation Mechanisms (KEMs), representing 4 NIST PQC submissions. We then apply a software-hardware co-design approach to 12 lattice-based CCA-secure KEMs, representing 8 Round 2 submissions. We hope that, combined with results reported by other groups, our study will provide NIST with helpful information regarding the relative performance of a significant subset of Round 2 PQC candidates, assuming that at least their major operations, and possibly the entire algorithms, are off-loaded to hardware. 
    more » « less
  2. It has been predicted that within the next tenfifteen years, quantum computers will have computational power sufficient to break current public-key cryptography schemes. When that happens, all traditional methods of dealing with the growing computational capabilities of potential attackers, such as increasing key sizes, will be futile. The only viable solution is to develop new standards based on algorithms that are resistant to quantum computer attacks and capable of being executed on traditional computing platforms, such as microprocessors and FPGAs. Leading candidates for new standards include lattice-based post-quantum cryptography (PQC) algorithms. In this paper, we present the results of implementing and benchmarking three lattice-based key encapsulation mechanisms (KEMs) that have progressed to Round 2 of the NIST standardization process. Our implementations are based on a software/hardware codesign approach, which is particularly applicable to the current stage of the NIST PQC standardization process, where the large number and high complexity of the candidates make traditional hardware benchmarking extremely challenging. We propose and justify the choice of a suitable system-on-chip platform and design methodology. The obtained results indicate the potential for very substantial speed-ups vs. purely software implementations, reaching 28x for encapsulation and 20x for decapsulation. 
    more » « less
  3. The speed of NTRU-based Key Encapsulation Mechanisms (KEMs) in software, especially on embedded software platforms, is limited by the long execution time of its primary operation, polynomial multiplication. In this paper, we investigate the potential for speeding up the implementations of four NTRU-based KEMs, using software/hardware codesign, when targeting Xilinx Zynq UltraScale+ multiprocessor system-on-chip (MPSoC). All investigated algorithms compete in Round 1 of the NIST PQC standardization process. They include: ntru-kem from the NTRUEncrypt submission, Streamlined NTRU Prime and NTRU LPRime KEMs of the NTRU Prime candidate, and NTRU- HRSS-KEM from the submission of the same name. The most time-consuming operation, polynomial multiplication, is implemented in the Programmable Logic (PL) of Zynq UltraScale+ (i.e., in hardware) using constant-time hardware architectures most appropriate for a given algorithm. The remaining operations are executed in the Processing System (PS) of Zynq, based on the ARM Cortex-A53 Application Processing Unit. The speed-ups of our software/hardware codesigns vs. purely software implementations, running on the same Zynq platform, are determined experimentally, and analyzed in the paper. Our experiments reveal substantial differences among the investigated candidates in terms of their potential to benefit from hardware accelerators, with the special focus on accelerators aimed at offloading to hardware only the most time-consuming operation of a given cryptosystems. The demonstrated speed-ups vs. functionally equivalent purely software implementations vary between 4.0 and 42.7 for encapsulation, and between 6.4 and 149.7 for decapsulation. 
    more » « less
  4. The recent advancement in quantum technology has initiated a new round of cryptosystem innovation, i.e., the emergence of Post-Quantum Cryptography (PQC). This new class of cryptographic schemes is intended to be mathematically resistant against any known attacks using quantum computers, but, at the same time, be fully implementable using traditional semiconductor technology. The National Institutes of Standards and Technology (NIST) has already started the PQC standardization process, and the initial pool of 69 submissions has been reduced to 26 Round 2 candidates. Echoing the pace of the PQC "revolution," this paper gives a detailed and thorough introduction to recent advances in the hardware implementation of PQC schemes, including challenges, new implementation methods, and novel hardware architectures. Specifically, we have: (i) described the challenges and rewards of implementing PQC in hardware; (ii) presented the novel methodology for the design-space exploration of PQC implementations using high-level synthesis (HLS); (iii) introduced a new underexplored PQC scheme (binary Ring-Learning-with-Errors), as well as its novel hardware implementation for possible lightweight applications. The overall content delivered by this paper could serve multiple purposes: (i) provide useful references for the potential learners and the interested public; (ii) introduce new areas and directions for potential research to the VTS community; (iii) facilitate the PQC standardization process and the exploration of related new ways of implementing cryptography in existing and emerging applications. 
    more » « less
  5. With the advent of large-scale quantum computers, factoring and discrete logarithm problems could be solved using the polynomialtime quantum algorithms. To ensure public-key security, a transition to quantum-resistant cryptographic protocols is required. Performance of hardware accelerators targeting different platforms and diverse application goals plays an important role in PQC candidates’ differentiation. Hardware accelerators based on FPGAs and ASICs also provide higher flexibility to create a very low area or ultra-high performance implementations at the high cost of the other. While the hardware/software codesign development of PQC schemes has already received an increasing research effort, a cost analysis of efficient pure hardware implementation is still lacking. On the other hand, since FPGA has various types of hardware resources, evaluating and making the accurate and fair comparison of hardware-based implementations against each other is very challenging. Without a common foundation, apples are compared to oranges. This paper demonstrates a pure hardware architecture for Kyber as one of the finalists in the third round of the NIST post-quantum cryptography standardization process. To enable real, realistic, and comparable evaluations in PQC schemes over hardware platforms, we compare our architecture over the ASIC platform as a common foundation showing that it outperforms the previous works in the literature. 
    more » « less