skip to main content


Search for: All records

Award ID contains: 1801512

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Bhasin, Shivam ; Chattopadhyay, Anupam ; Güneysu, Tim ; Bhunia, Swarup (Ed.)
    Digital signature algorithms are the foundation of many secure communication protocols, including those used in Internet of Things (IoT) applications. While the current generation of signature schemes is secure against classical attacks, they are potentially vulnerable to attacks using quantum computers. Because of this threat, multiple new schemes have been developed and evaluated in recent years. From among these schemes, the National Institute of Standards and Technology standardized two and selected additional three for near-term standardization. For use in IoT, these schemes must be sufficiently efficient in terms of their public-key and signature sizes and the timing of major operations. In this paper, we analyze the choice between two primary schemes considered for extensive use in IoT, CRYSTALS-Dilithium and FALCON, from the point of view of developing efficient hardware accelerators supporting cryptographic operations performed by IoT clients and servers. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  2. Johansson, Thomas ; Smith-Tone, Daniel (Ed.)
    In 2022, NIST selected the first set of four post-quantum cryptography schemes for near-term standardization. Three of them - CRYSTALS-Kyber, CRYSTALS-Dilithium, and FALCON - belong to the lattice-based family and one - SPHINCS+ - to the hash-based family. NIST has also announced an ”on-ramp” for new digital signature candidates to add greater diversity to the suite of new standards. One promising set of schemes - a subfamily of code-based cryptography - is based on the linear code equivalence problem. This well-studied problem can be used to design flexible and efficient digital signature schemes. One of these schemes, LESS, was submitted to the NIST standardization process in June 2023. In this work, we present a high-performance hardware implementation of LESS targeting Xilinx FPGAs. The obtained results are compared with those for the state-of-the-art hardware implementations of CRYSTALS-Dilithium, SPHINCS+, and FALCON. 
    more » « less
    Free, publicly-accessible full text available August 16, 2024
  3. El Mrabet, N. ; De Feo, L. ; Duquesne, S. (Ed.)
    We present our speed records for Falcon signature generation and verification on ARMv8-A architecture. Our implementations are benchmarked on Apple M1 ‘Firestorm’, Raspberry Pi 4 Cortex-A72, and Jetson AGX Xavier. Our optimized signature generation is 2x slower, but signature verification is 3–3.9x faster than the state-of-the-art CRYSTALS-Dilithium implementation on the same platforms. Faster signature verification may be particularly useful for the client side on con-strained devices. Our Falcon implementation outperforms the previous work targeting Jetson AGX Xavier by the factors 1.48x for signing in falcon512 and falcon1024, 1.52x for verifying in falcon512, and 1.70x for verifying in falcon1024. We achieve improvement in Falcon signature generation by supporting a larger subset of possible parameter values for FFT-related functions and applying our compressed twiddle-factor table to reduce memory usage. We also demonstrate that the recently proposed signature scheme Hawk, sharing optimized functionality with Falcon, has 3.3x faster signature generation and 1.6–1.9x slower signature verification when implemented on the same ARMv8 processors as Falcon. 
    more » « less
    Free, publicly-accessible full text available July 13, 2024
  4. In this paper, we investigate the practical performance of rank-code based cryptography on FPGA platforms by presenting a case study on the quantum-safe KEM scheme based on LRPC codes called ROLLO, which was among NIST post-quantum cryptography standardization round-2 candidates. Specifically, we present an FPGA implementation of the encapsulation and decapsulation operations of the ROLLO KEM scheme with some variations to the original specification. The design is fully parameterized, using code-generation scripts to support a wide range of parameter choices for security levels specified in ROLLO. At the core of the ROLLO hardware, we presented a generic approach for hardware-based Gaussian elimination, which can process both non-singular and singular matrices. Previous works on hardware-based Gaussian elimination can only process non-singular ones. However, a plethora of cryptosystems, for instance, quantum-safe key encapsulation mechanisms based on rank-metric codes, ROLLO and RQC, which are among NIST post-quantum cryptography standardization round-2 candidates, require performing Gaussian elimination for random matrices regardless of the singularity. To the best of our knowledge, this work is the first hardware implementation for rank-code-based cryptographic schemes. The experimental results suggest rank-code-based schemes can be highly efficient. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  5. Providing error detection constructions for Internet of nano-Things in constrained applications is of prominent importance. The Niederreiter cryptosystem falls into the category of code-based public-key cryptography. It is a relatively well-established scheme, but its key size and performance overheads have traditionally hindered its efficiency to be utilized for traditional computers. However, with the arrival of quantum computers, the Niederreiter cryptosystem is believed to be secure against attacks enabled by such computers, even though it has been previously shown that it is still vulnerable to fault injection and natural hardware defects. In this paper, we present fault detection schemes for the different blocks in the key generation of the Niederreiter cryptosystem using binary Goppa codes. These blocks perform finite field operations such as addition, multiplication, squaring, and inversion. The schemes are derived for different parameter sizes in order to have more flexibility and be able to choose according to the overheads to be tolerated and the required level of security. Moreover, we implement our fault detection schemes on Xilinx field-programmable gate array (FPGA) family Kintex UltraScale+ (device xcku5p-ffvd900-1-i) to benchmark the overhead induced of the proposed approaches. 
    more » « less
  6. Adhikari, Avishek ; Küsters, Ralf ; Preneel, Bart (Ed.)
    The field of post-quantum cryptography aims to develop and analyze algorithms that can withstand classical and quantum cryptanalysis. The NIST PQC standardization process, now in its third round, specifies ease of protection against side-channel analysis as an important selection criterion. In this work, we develop and validate a masked hardware implementation of Saber key encapsulation mechanism, a third-round NIST PQC finalist. We first design a baseline lightweight hardware architecture of Saber and then apply side-channel countermeasures. Our protected hardware implementation is significantly faster than previously reported protected software and software/hardware co-design implementations. Additionally, applying side-channel countermeasures to our baseline design incurs approximately 2.9x and 1.4x penalty in terms of the number of LUTs and latency, respectively, in modern FPGAs. 
    more » « less
  7. Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. One such algorithm is CRYSTALS-Dilithium, a lattice-based digital signature scheme, which is a finalist in the NIST Post Quantum Cryptography (PQC) competition. As a part of this evaluation, high-performance implementations of these algorithms must be investigated. This work presents a high-performance implementation of CRYSTALS-Dilithium targeting FPGAs. In particular, we present a design that achieves the best latency for an FPGA implementation to date. We also compare our results with the most-relevant previous work on hardware implementations of NIST Round 3 post-quantum digital signature candidates. 
    more » « less
  8. Cheon, Jung Hee ; Tillich, Jean-Pierre (Ed.)
    This paper focuses on high-speed NEON-based constant-time implementations of multiplication of large polynomials in the NIST PQC KEM Finalists: NTRU, Saber, and CRYSTALS-Kyber. We use the Number Theoretic Transform (NTT)-based multiplication in Kyber, the Toom-Cook algorithm in NTRU, and both types of multiplication in Saber. Following these algorithms and using Apple M1, we improve the decapsulation performance of the NTRU, Kyber, and Saber-based KEMs at the security level 3 by the factors of 8.4, 3.0, and 1.6, respectively, compared to the reference implementations. On Cortex-A72, we achieve the speed-ups by factors varying between 5.7 and 7.5x for the Forward/Inverse NTT in Kyber, and between 6.0 and 7.8x for Toom-Cook in NTRU, over the best existing implementations in pure C. For Saber, when using NEON instructions on Cortex-A72, the implementation based on NTT outperforms the implementation based on the Toom-Cook algorithm by 14% in the case of the MatrixVectorMul function but is slower by 21% in the case of the InnerProduct function. Taking into account that in Saber, keys are not available in the NTT domain, the overall performance of the NTT-based version is very close to the performance of the Toom-Cook version. The differences for the entire decapsulation at the three major security levels (1, 3, and 5) are −4, −2, and +2%, respectively. Our benchmarking results demonstrate that our NEON-based implementations run on an Apple M1 ARM processor are comparable to those obtained using the best AVX2-based implementations run on an AMD EPYC 7742 processor. Our work is the first NEON-based ARMv8 implementation of each of the three NIST PQC KEM finalists. 
    more » « less
  9. Katzenbeisser, Stefan ; Schaumont, Patrick (Ed.)
    LowMC is a parameterizable block cipher developed for use in Multi-Party Computation (MPC) and Fully Homomorphic Encryption (FHE). In these applications, linear operations are much less expensive in terms of resource utilization compared to the non-linear operations due to their low multiplicative complexity. In this work, we implemented two versions of LowMC -- unrolled and lightweight. Both implementations are realized using RTL VHDL. To the best of our knowledge, we report the first lightweight implementation of LowMC and the first implementation protected against side-channel analysis (SCA). For the SCA protection, we used a hybrid 2/3 shares Threshold Implementation (TI) approach, and for the evaluation, the Test Vector Leakage Assessment (TVLA) method, also known as the T-test. Our unprotected implementations show information leakage at 10K traces, and after protection, they could successfully pass the T-test for 1 million traces. The Xilinx Vivado is used for the synthesis, implementation, functional verification, timing analysis, and programming of the FPGA. The target FPGA family is Artix-7, selected due to its widespread use in multiple applications. Based on our results, the numbers of LUTs are 867 and 3,328 for the lightweight and the unrolled architecture with unrolling factor U = 16, respectively. It takes 14.21 μs for the lightweight architecture and 1.29 μs for the unrolled design with U = 16 to generate one 128-bit block of the ciphertext. The fully unrolled architecture beats the best previous implementation by Kales et al. in terms of the number of LUTs by a factor of 4.5. However, this advantage comes at the cost of having 2.9 higher latency. 
    more » « less