Architectures for Serial and Parallel Pipelined NTT-Based Polynomial Modular Multiplication

Chiu, Sin-Wei; Parhi, Keshab K

doi:10.1109/TVLSI.2025.3576782

Citation Details

This content will become publicly available on June 11, 2026

Architectures for Serial and Parallel Pipelined NTT-Based Polynomial Modular Multiplication

Quantum computers pose a significant threat to modern cryptographic systems by efficiently solving problems such as integer factorization through Shor’s algorithm. Homomorphic encryption (HE) schemes based on ring learning with errors (Ring-LWE) offer a quantum-resistant framework for secure computations on encrypted data. Many of these schemes rely on polynomial multiplication, which can be efficiently accelerated using the number theoretic transform (NTT) in leveled HE, ensuring practical performance for privacy-preserving applications. This article presents a novel NTT-based serial pipelined multiplier that achieves full-hardware utilization through interleaved folding, and overcomes the 50% under-utilization limitation of the conventional serial R2MDC architecture. In addition, it explores tradeoffs in pipelined parallel designs, including serial, 2-parallel, and 4-parallel architectures. Our designs leverage increased parallelism, efficient folding techniques, and optimizations for a selected constant modulus to achieve superior throughput (TP) compared with state-of-the-art implementations. While the serial fold design minimizes area consumption, the 4-parallel design maximizes TP. Experimental results on the Virtex-7 platform demonstrate that our architectures achieve at least 2.22 times higher TP/area for a polynomial length of 1024 and 1.84 times for a polynomial length of 4096 in the serial fold design, while the 4-parallel design achieves at least 2.78 times and 2.79 times, respectively. The efficiency gain is even more pronounced in TP squared over area, where the serial fold and 4-parallel designs outperform prior works by at least 4.98 times and 26.43 times for a polynomial length of 1024 and 6.7 times and 43.77 times for a polynomial length of 4096, respectively. These results highlight the effectiveness of our architectures in balancing performance, area efficiency, and flexibility, making them well-suited for high-speed cryptographic applications. more »

Award ID(s):: 2243053

PAR ID:: 10599772

Author(s) / Creator(s):: Chiu, Sin-Wei; Parhi, Keshab K

Publisher / Repository:: IEEE

Date Published:: 2025-06-11

Journal Name:: IEEE Transactions on Very Large Scale Integration (VLSI) Systems

ISSN:: 1063-8210

Page Range / eLocation ID:: 1 to 14

Subject(s) / Keyword(s):: Folding , homomorphic encryption (HE) , interleaving , number theoretic transform (NTT) , parallel processing , pipelining , polynomial modular multiplication

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 11, 2026
Journal Article:
https://doi.org/10.1109/TVLSI.2025.3576782

More Like this