skip to main content


Title: Parameterized Hardware Accelerators for Lattice-Based Cryptography and Their Application to the HW/SW Co-Design of qTESLA
This paper presents a set of efficient and parameterized hardware accelerators that target post-quantum lattice-based cryptographic schemes, including a versatile cSHAKE core, a binary-search CDT-based Gaussian sampler, and a pipelined NTT-based polynomial multiplier, among others. Unlike much of prior work, the accelerators are fully open-sourced, are designed to be constant-time, and can be parameterized at compile-time to support different parameters without the need for re-writing the hardware implementation. These flexible, publicly-available accelerators are leveraged to demonstrate the first hardware-software co-design using RISC-V of the post-quantum lattice-based signature scheme qTESLA with provably secure parameters. In particular, this work demonstrates that the NIST’s Round 2 level 1 and level 3 qTESLA variants achieve over a 40-100x speedup for key generation, about a 10x speedup for signing, and about a 16x speedup for verification, compared to the baseline RISC-V software-only implementation. For instance, this corresponds to execution in 7.7, 34.4, and 7.8 milliseconds for key generation, signing, and verification, respectively, for qTESLA’s level 1 parameter set on an Artix-7 FPGA, demonstrating the feasibility of the scheme for embedded applications.  more » « less
Award ID(s):
1716541
NSF-PAR ID:
10167501
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IACR transactions on cryptographic hardware and embedded systems
Volume:
2020
Issue:
3
ISSN:
2569-2925
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We introduce a simple, yet efficient digital signature scheme which offers post-quantum security promise. Our scheme, named TACHYON, is based on a novel approach for extending one-time hash-based signatures to (polynomially bounded) many-time signatures, using the additively homomorphic properties of generalized compact knapsack functions. Our design permits TACHYON~to achieve several key properties. First, its signing and verification algorithms are the fastest among its current counterparts with a higher level of security. This allows TACHYON~to achieve the lowest end-to-end delay among its counterparts, while also making it suitable for resource-limited signers. Second, its private keys can be as small as κ bits, where κ is the desired security level. Third, unlike most of its lattice-based counterparts, TACHYON~does not require any Gaussian sampling during signing, and therefore, is free from side-channel attacks targeting this process. We also explore various speed and storage trade-offs for TACHYON, thanks to its highly tunable parameters. Some of these trade-offs can speed up TACHYON signing in exchange for larger keys, thereby permitting TACHYON~to further improve its end-to-end delay. 
    more » « less
  2. When quantum computers become scalable and reliable, they are likely to break all public-key cryptography standards, such as RSA and Elliptic Curve Cryptography. The projected threat of quantum computers has led the U.S. National Institute of Standards and Technology (NIST) to an effort aimed at replacing existing public-key cryptography standards with new quantum-resistant alternatives. In December 2017, 69 candidates were accepted by NIST to Round 1 of the NIST Post-Quantum Cryptography (PQC) standardization process. NTRUEncrypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of its primary operation, polynomial multiplication. In this paper, we investigate speeding up NTRUEncrypt using software/hardware codesign on a Xilinx Zynq UltraScale+ multiprocessor system-on-chip (MPSoC). Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches: traditional Register-Transfer Level (RTL) and High-Level Synthesis (HLS). The remaining operations of NTRUEncrypt are executed in software on the Processing System (PS) of Zynq, using the bare-metal mode. The speed-up of our software/hardware codesigns vs. purely software implementations is determined experimentally and analyzed in the paper. The results are reported for the RTL-based and HLS-based hardware accelerators, and compared to the best available software implementation, included in the NIST submission package. The speed-ups for encryption were 2.4 and 3.9, depending on the selected parameter set. For decryption, the corresponding speed-ups were 4.0 and 6.8. In addition, for the polynomial multiplication operation itself, the speed up was in excess of 75. Our code for the NTRUEncrypt polynomial multiplier accelerator is being made open-source for further evaluation on multiple software/hardware platforms. 
    more » « less
  3. When quantum computers become scalable and reliable, they are likely to break all public-key cryptography standards, such as RSA and Elliptic Curve Cryptography. The projected threat of quantum computers has led the U.S. National Institute of Standards and Technology (NIST) to an effort aimed at replacing existing public-key cryptography standards with new quantum-resistant alternatives. In December 2017, 69 candidates were accepted by NIST to Round 1 of the NIST Post-Quantum Cryptography (PQC) standardization process. NTRUEncrypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of its primary operation, polynomial multiplication. In this paper, we investigate speeding up NTRUEncrypt using software/hardware codesign on a Xilinx Zynq UltraScale+ multiprocessor system-on-chip (MPSoC). Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches: traditional Register-Transfer Level (RTL) and High-Level Synthesis (HLS). The remaining operations of NTRUEncrypt are executed in software on the Processing System (PS) of Zynq, using the bare-metal mode. The speed-up of our software/hardware codesigns vs. purely software implementations is determined experimentally and analyzed in the paper. The results are reported for the RTL-based and HLS-based hardware accelerators, and compared to the best available software implementation, included in the NIST submission package. The speed-ups for encryption were 2.4 and 3.9, depending on the selected parameter set. For decryption, the corresponding speed-ups were 4.0 and 6.8. In addition, for the polynomial multiplication operation itself, the speed up was in excess of 75. Our code for the NTRUEncrypt polynomial multiplier accelerator is being made open-source for further evaluation on multiple software/hardware platforms. 
    more » « less
  4. Key exchange protocols and key encapsulation mechanisms establish secret keys to communicate digital information confidentially over public channels. Lattice-based cryptography variants of these protocols are promising alternatives given their quantum-cryptanalysis resistance and implementation efficiency. Although lattice cryptosystems can be mathematically secure, their implementations have shown side-channel vulnerabilities. But such attacks largely presume collecting multiple measurements under a fixed key, leaving the more dangerous single-trace attacks unexplored. This article demonstrates successful single-trace power side-channel attacks on lattice-based key exchange and encapsulation protocols. Our attack targets both hardware and software implementations of matrix multiplications used in lattice cryptosystems. The crux of our idea is to apply a horizontal attack that makes hypotheses on several intermediate values within a single execution all relating to the same secret, and to combine their correlations for accurately estimating the secret key. We illustrate that the design of protocols combined with the nature of lattice arithmetic enables our attack. Since a straightforward attack suffers from false positives, we demonstrate a novel extend-and-prune procedure to recover the key by following the sequence of intermediate updates during multiplication. We analyzed two protocols, Frodo and FrodoKEM , and reveal that they are vulnerable to our attack. We implement both stand-alone hardware and RISC-V based software realizations and test the effectiveness of the proposed attack by using concrete parameters of these protocols on physical platforms with real measurements. We show that the proposed attack can estimate secret keys from a single power measurement with over 99% success rate. 
    more » « less
  5. Motivated by the rise of quantum computers, existing public-key cryptosystems are expected to be replaced by post-quantum schemes in the next decade in billions of devices. To facilitate the transition, NIST is running a standardization process which is currently in its final Round. Only three digital signature schemes are left in the competition, among which Dilithium and Falcon are the ones based on lattices. Besides security and performance, significant attention has been given to resistance against implementation attacks that target side-channel leakage or fault injection response. Classical fault attacks on signature schemes make use of pairs of faulty and correct signatures to recover the secret key which only works on deterministic schemes. To counter such attacks, Dilithium offers a randomized version which makes each signature unique, even when signing identical messages. In this work, we introduce a novel Signature Correction Attack which not only applies to the deterministic version but also to the randomized version of Dilithium and is effective even on constant-time implementations using AVX2 instructions. The Signature Correction Attack exploits the mathematical structure of Dilithium to recover the secret key bits by using faulty signatures and the public-key. It can work for any fault mechanism which can induce single bit-flips. For demonstration, we are using Rowhammer induced faults. Thus, our attack does not require any physical access or special privileges, and hence could be also implemented on shared cloud servers. Using Rowhammer attack, we inject bit flips into the secret key s1 of Dilithium, which results in incorrect signatures being generated by the signing algorithm. Since we can find the correct signature using our Signature Correction algorithm, we can use the difference between the correct and incorrect signatures to infer the location and value of the flipped bit without needing a correct and faulty pair. To quantify the reduction in the security level, we perform a thorough classical and quantum security analysis of Dilithium and successfully recover 1,851 bits out of 3,072 bits of secret key $s_{1}$ for security level 2. Fully recovered bits are used to reduce the dimension of the lattice whereas partially recovered coefficients are used to to reduce the norm of the secret key coefficients. Further analysis for both primal and dual attacks shows that the lattice strength against quantum attackers is reduced from 2128 to 281 while the strength against classical attackers is reduced from 2141 to 289. Hence, the Signature Correction Attack may be employed to achieve a practical attack on Dilithium (security level 2) as proposed in Round 3 of the NIST post-quantum standardization process. 
    more » « less