Due to an emerging threat of quantum computing, one of the major challenges facing the cryptographic community is a timely transition from traditional public-key cryptosystems, such as RSA and Elliptic Curve Cryptography, to a new class of algorithms, collectively referred to as Post-Quantum Cryptography (PQC). Several promising candidates for a new PQC standard can have their software and hardware implementations accelerated using the Number Theoretic Transform (NTT). In this paper, we present an improved hardware architecture for NTT, with the hardware-friendly modular reduction, and demonstrate that this architecture can be efficiently implemented in hardware using High-Level Synthesis (HLS). The novel feature of the proposed architecture is an original memory write-back scheme, which assists in preparing coefficients for performing later NTT stages, saving memory storage used for precomputed constants. Our design is the most efficient for the case when log2N is even. The latency of our proposed architecture is approximately equal to (N log2(N) +3N)/4 clock cycles. As a proof of concept, we implemented the NTT operation for several parameter sets used in the PQC algorithms NewHope, FALCON, qTESLA, and CRYSTALS-DILITHIUM.
more »
« less
Full hardware implementation of the Post-Quantum Public-Key Cryptography Scheme Round5
We describe a generic high-speed hardware architecture for the lattice-based post-quantum cryptosystem Round5. This architecture supports both public-key encryption (PKE) and a key encapsulation mechanism (KEM). Due to several hardware-friendly features, Round5 can achieve very high performance when implemented in modern FPGAs.
more »
« less
- Award ID(s):
- 1801512
- PAR ID:
- 10174999
- Date Published:
- Journal Name:
- 2019 International Conference on ReConFigurable Computing and FPGAs (ReConFig), Cancun, Mexico
- Page Range / eLocation ID:
- 1 to 2
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the advent of large-scale quantum computers, factoring and discrete logarithm problems could be solved using the polynomialtime quantum algorithms. To ensure public-key security, a transition to quantum-resistant cryptographic protocols is required. Performance of hardware accelerators targeting different platforms and diverse application goals plays an important role in PQC candidates’ differentiation. Hardware accelerators based on FPGAs and ASICs also provide higher flexibility to create a very low area or ultra-high performance implementations at the high cost of the other. While the hardware/software codesign development of PQC schemes has already received an increasing research effort, a cost analysis of efficient pure hardware implementation is still lacking. On the other hand, since FPGA has various types of hardware resources, evaluating and making the accurate and fair comparison of hardware-based implementations against each other is very challenging. Without a common foundation, apples are compared to oranges. This paper demonstrates a pure hardware architecture for Kyber as one of the finalists in the third round of the NIST post-quantum cryptography standardization process. To enable real, realistic, and comparable evaluations in PQC schemes over hardware platforms, we compare our architecture over the ASIC platform as a common foundation showing that it outperforms the previous works in the literature.more » « less
-
Qiu, Meikang (Ed.)Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop hardware architecture. The main advantage of the proposed approach is eliminating the overhead of the data transfer between a CPU and a hardware accelerator. The authors believe that this is the first such architecture reported in the literature to date and predict to achieve up to 8 times higher throughput when compared to a multi-core high-performance CPU. Presented methods can be adapted for other sieving algorithms hard to implement in FPGAs due to the communication and memory bottleneck.more » « less
-
This paper presents a framework for embedding watermarks into DNN hardware accelerators. Unlike previous works that have looked at protecting the algorithmic intellectual properties of deep learning systems, this work proposes a methodology for defending deep learning hardware. Our methodology embeds modifications into the hardware accelerator's functional blocks that can be revealed with the rightful owner's key DNN and corresponding key sample, verifying the legitimate owner. We propose an Lp-box ADMM based algorithm to co-optimize watermark's hardware overhead and impact on the design's algorithmic functionality. We evaluate the performance of the hardware watermarking scheme on popular image classifier models using various accelerator designs. Our results demonstrate that the proposed methodology effectively embeds watermarks while preserving the original functionality of the hardware architecture. Specifically, we can successfully embed watermarks into the deep learning hardware and reliably execute a ResNet ImageNet classifiers with an accuracy degradation of only 0.009%more » « less
-
Wideband beamforming and interference cancellation for phased array antennas requires advances in signal processing algorithms, software, and specialized hardware platforms. A high-throughput array receiver has been developed that enables communication in radio frequency interference-rich environments with field programmable gate array (FPGA)-based frequency channelization and packetization. In this study, a real-time interference mitigation algorithm was implemented on graphics processing units (GPUs) contained in the data pipeline. The key contribution is a hardware and software pipeline for subchannelized wideband array signal processing with 150 MHz instantaneous bandwidth and interference cancellation with a heterogeneous, distributed, and scaleable digital signal processing (DSP) architecture that achieves 30 dB interferer cancellation null depth in real time with a moving interference source.more » « less
An official website of the United States government

