Due to an emerging threat of quantum computing, one of the major challenges facing the cryptographic community is a timely transition from traditional public-key cryptosystems, such as RSA and Elliptic Curve Cryptography, to a new class of algorithms, collectively referred to as Post-Quantum Cryptography (PQC). Several promising candidates for a new PQC standard can have their software and hardware implementations accelerated using the Number Theoretic Transform (NTT). In this paper, we present an improved hardware architecture for NTT, with the hardware-friendly modular reduction, and demonstrate that this architecture can be efficiently implemented in hardware using High-Level Synthesis (HLS). The novel feature of the proposed architecture is an original memory write-back scheme, which assists in preparing coefficients for performing later NTT stages, saving memory storage used for precomputed constants. Our design is the most efficient for the case when log2N is even. The latency of our proposed architecture is approximately equal to (N log2(N) +3N)/4 clock cycles. As a proof of concept, we implemented the NTT operation for several parameter sets used in the PQC algorithms NewHope, FALCON, qTESLA, and CRYSTALS-DILITHIUM.
more »
« less
Full hardware implementation of the Post-Quantum Public-Key Cryptography Scheme Round5
We describe a generic high-speed hardware architecture for the lattice-based post-quantum cryptosystem Round5. This architecture supports both public-key encryption (PKE) and a key encapsulation mechanism (KEM). Due to several hardware-friendly features, Round5 can achieve very high performance when implemented in modern FPGAs.
more »
« less
- Award ID(s):
- 1801512
- PAR ID:
- 10174999
- Date Published:
- Journal Name:
- 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico
- Page Range / eLocation ID:
- 1 to 2
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the advent of large-scale quantum computers, factoring and discrete logarithm problems could be solved using the polynomialtime quantum algorithms. To ensure public-key security, a transition to quantum-resistant cryptographic protocols is required. Performance of hardware accelerators targeting different platforms and diverse application goals plays an important role in PQC candidates’ differentiation. Hardware accelerators based on FPGAs and ASICs also provide higher flexibility to create a very low area or ultra-high performance implementations at the high cost of the other. While the hardware/software codesign development of PQC schemes has already received an increasing research effort, a cost analysis of efficient pure hardware implementation is still lacking. On the other hand, since FPGA has various types of hardware resources, evaluating and making the accurate and fair comparison of hardware-based implementations against each other is very challenging. Without a common foundation, apples are compared to oranges. This paper demonstrates a pure hardware architecture for Kyber as one of the finalists in the third round of the NIST post-quantum cryptography standardization process. To enable real, realistic, and comparable evaluations in PQC schemes over hardware platforms, we compare our architecture over the ASIC platform as a common foundation showing that it outperforms the previous works in the literature.more » « less
-
Qiu, Meikang (Ed.)Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop hardware architecture. The main advantage of the proposed approach is eliminating the overhead of the data transfer between a CPU and a hardware accelerator. The authors believe that this is the first such architecture reported in the literature to date and predict to achieve up to 8 times higher throughput when compared to a multi-core high-performance CPU. Presented methods can be adapted for other sieving algorithms hard to implement in FPGAs due to the communication and memory bottleneck.more » « less
-
This paper presents a framework for embedding watermarks into DNN hardware accelerators. Unlike previous works that have looked at protecting the algorithmic intellectual properties of deep learning systems, this work proposes a methodology for defending deep learning hardware. Our methodology embeds modifications into the hardware accelerator's functional blocks that can be revealed with the rightful owner's key DNN and corresponding key sample, verifying the legitimate owner. We propose an Lp-box ADMM based algorithm to co-optimize watermark's hardware overhead and impact on the design's algorithmic functionality. We evaluate the performance of the hardware watermarking scheme on popular image classifier models using various accelerator designs. Our results demonstrate that the proposed methodology effectively embeds watermarks while preserving the original functionality of the hardware architecture. Specifically, we can successfully embed watermarks into the deep learning hardware and reliably execute a ResNet ImageNet classifiers with an accuracy degradation of only 0.009%more » « less
-
Fully Homomorphic Encryption (FHE) presents a paradigm-shifting framework for performing computations on encrypted data, offering revolutionary implications for privacy-preserving technologies. This paper introduces a novel hardware implementation of scheme switching between two leading FHE schemes targeting different computational needs, i.e., arithmetic HE scheme CKKS, and Boolean HE scheme FHEW. The proposed architecture facilitates dynamic switching between the schemes with improved throughput and latency compared to the software baseline. The proposed architecture computation modules support scheme switching operations involving coefficient conversion, modular switching, and key switching. We also optimize the hardware designs for the pre-processing and post-processing blocks, involving key generation, encryption, and decryption. The effectiveness of our proposed design is verified on the Xilinx U280 Datacenter Acceleration FPGA. We demonstrate that the proposed scheme switching accelerator yields a 365Ă— performance improvement over the software counterpart.more » « less
An official website of the United States government

