skip to main content


Title: Comparing the Difficulty of Factorization and Discrete Logarithm: A 240-Digit Experiment
We report on two new records: the factorization of RSA-240, a 795-bit number, and a discrete logarithm computation over a 795-bit prime field. Previous records were the factorization of RSA-768 in 2009 and a 768-bit discrete logarithm computation in 2016. Our two computations at the 795-bit level were done using the same hardware and software, and show that computing a discrete logarithm is not much harder than a factorization of the same size. Moreover, thanks to algorithmic variants and well-chosen parameters, our computations were significantly less expensive than anticipated based on previous records.  more » « less
Award ID(s):
1651344
NSF-PAR ID:
10273009
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
CRYPTO 2020: Advances in Cryptology – CRYPTO 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the bit complexity of two related fundamental computational problems in linear algebra and control theory. Our results are: (1) An Õ(n^{ω+3}a+n⁴a²+n^ωlog(1/ε)) time algorithm for finding an ε-approximation to the Jordan Normal form of an integer matrix with a-bit entries, where ω is the exponent of matrix multiplication. (2) An Õ(n⁶d⁶a+n⁴d⁴a²+n³d³log(1/ε)) time algorithm for ε-approximately computing the spectral factorization P(x) = Q^*(x)Q(x) of a given monic n× n rational matrix polynomial of degree 2d with rational a-bit coefficients having a-bit common denominators, which satisfies P(x)⪰0 for all real x. The first algorithm is used as a subroutine in the second one. Despite its being of central importance, polynomial complexity bounds were not previously known for spectral factorization, and for Jordan form the best previous best running time was an unspecified polynomial in n of degree at least twelve [Cai, 1994]. Our algorithms are simple and judiciously combine techniques from numerical and symbolic computation, yielding significant advantages over either approach by itself. 
    more » « less
  2. Arbitrary-precision integer multiplication is the core kernel of many applications including scientific computing, cryptographic algorithms, etc. Existing acceleration of arbitrary-precision integer multiplication includes CPUs, GPUs, FPGAs, and ASICs. To leverage the hardware intrinsics low-bit function units (32/64-bit), arbitrary-precision integer multiplication can be calculated using Karatsuba decomposition, and Schoolbook decomposition by decomposing the two large operands into several small operands, generating a set of low-bit multiplications that can be processed either in a spatial or sequential manner on the low-bit function units, e.g., CPU vector instructions, GPU CUDA cores, FPGA digital signal processing (DSP) blocks. Among these accelerators, reconfigurable computing, e.g., FPGA accelerators are promised to provide both good energy efficiency and flexibility. We implement the state-of-the-art (SOTA) FPGA accelerator and compare it with the SOTA libraries on CPUs and GPUs. Surprisingly, in terms of energy efficiency, we find that the FPGA has the lowest energy efficiency, i.e., 0.29x of the CPU and 0.17x of the GPU with the same generation fabrication. Therefore, key questions arise: Where do the energy efficiency gains of CPUs and GPUs come from? Can reconfigurable computing do better? If can, how to achieve that? We first identify that the biggest energy efficiency gains of the CPUs and GPUs come from the dedicated vector units, i.e., vector instruction units in CPUs and CUDA cores in GPUs. FPGA uses DSPs and lookup tables (LUTs) to compose the needed computation, which incurs overhead when compared to using vector units directly. New reconfigurable computing, e.g., “FPGA+vector units” is a novel and feasible solution to improve energy efficiency. In this paper, we propose to map arbitrary-precision integer multiplication onto such a “FPGA+vector units” platform, i.e., AMD/Xilinx Versal ACAP architecture, a heterogeneous reconfigurable computing platform that features 400 AI engine tensor cores (AIE) running at 1 GHz, FPGA programmable logic (PL), and a general-purpose CPU in the system fabricated with the TSMC 7nm technology. Designing on Versal ACAP incurs several challenges and we propose AIM: Arbitrary-precision Integer Multiplication on Versal ACAP to automate and optimize the design. AIM accelerator is composed of AIEs, PL, and CPU. AIM framework includes analytical models to guide design space exploration and AIM automatic code generation to facilitate the system design and on-board design verification. We deploy the AIM framework on three different applications, including large integer multiplication (LIM), RSA, and Mandelbrot, on the AMD/Xilinx Versal ACAP VCK190 evaluation board. Our experimental results show that compared to existing accelerators, AIM achieves up to 12.6x, and 2.1x energy efficiency gains over the Intel Xeon Ice Lake 6346 CPU, and NVidia A5000 GPU respectively, which brings reconfigurable computing the most energy-efficient platform among CPUs and GPUs. 
    more » « less
  3. null (Ed.)
    Zero-knowledge (ZK) proofs with an optimal memory footprint have attracted a lot of attention, because such protocols can easily prove very large computation with a small memory requirement. Such ZK protocol only needs O(M) memory for both parties, where M is the memory required to verify the statement in the clear. In this paper, we propose several new ZK protocols in this setting, which improve the concrete efficiency and, at the same time, enable sublinear amortized communication for circuits with some notion of relaxed uniformity. 1. In the circuit-based model, where the computation is represented as a circuit over a field, our ZK protocol achieves a communication complexity of 1 field element per non-linear gate for any field size while keeping the computation very cheap. We implemented our protocol, which shows extremely high efficiency and affordability. Compared to the previous best-known implementation, we achieve 6×–7× improvement in computation and 3×– 7× improvement in communication. When running on intro-level AWS instances, our protocol only needs one US dollar to prove one trillion AND gates (or 2.5 US dollars for one trillion multiplication gates over a 61-bit field). 2. In the setting where part of the computation can be represented as a set of polynomials, we can achieve communication sublinear to the polynomial size: the communication only depends on the input size and the highest degree of all polynomials, independent of the number of polynomials and the number of multiplications in the polynomials. Using the improved ZK protocol, we can prove matrix multiplication with communication proportional to the input size, rather than the number of multiplications. Proving the multiplication of two 1024 × 1024 matrices, our implementation, with one thread and 1 GB of memory, only needs 10 seconds and communicates 25 MB, 35× faster than the state-of-the-art protocol Virgo that would need more than 140 GB of memory for the same task. 
    more » « less
  4. Many currently deployed public-key cryptosystems are based on the difficulty of the discrete logarithm and integer factorization problems. However, given an adequately sized quantum computer, these problems can be solved in polynomial time as a function of the key size. Due to the future threat of quantum computing to current cryptographic standards, alternative algorithms that remain secure under quantum computing are being evaluated for future use. One such algorithm is CRYSTALS-Dilithium, a lattice-based digital signature scheme, which is a finalist in the NIST Post Quantum Cryptography (PQC) competition. As a part of this evaluation, high-performance implementations of these algorithms must be investigated. This work presents a high-performance implementation of CRYSTALS-Dilithium targeting FPGAs. In particular, we present a design that achieves the best latency for an FPGA implementation to date. We also compare our results with the most-relevant previous work on hardware implementations of NIST Round 3 post-quantum digital signature candidates. 
    more » « less
  5. Quantum computing challenges the computational hardness assumptions anchoring the security of public-key ciphers, such as the prime factorization and the discrete logarithm problem. To prepare for the quantum era and withstand the attacks equipped with quantum computing, the security and cryptography communities are designing new quantum-resistant public-key ciphers. National Institute of Standards and Technology (NIST) is collecting and standardizing the post-quantum ciphers, similarly to its past involvements in establishing DES and AES as symmetric cipher standards. The NIST finalist algorithms for public-key signatures are Dilithium, Falcon, and Rainbow. Finding common ground to compare these algorithms can be difficult because of their design, the underlying computational hardness assumptions (lattice based vs. multivariate based), and the different metrics used for security strength analyses in the previous research (qubits vs. quantum gates). We overcome such challenges and compare the security and the performances of the finalist post-quantum ciphers of Dilithium, Falcon, and Rainbow. For security comparison analyses, we advance the prior literature by using the depth-width cost for quantum circuits (DW cost) to measure the security strengths and by analyzing the security in Universal Quantum Gate Model and with Quantum Annealing. For performance analyses, we compare the algorithms’ computational loads in the execution time as well as the communication costs and implementation overheads when integrated with Transport Layer Security (TLS) and Transmission Control Protocol (TCP)/Internet Protocol (IP). Our work presents a security comparison and performance analysis as well as the trade-off analysis to inform the post-quantum cipher design and standardization to protect computing and networking in the post-quantum era. 
    more » « less