skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Multiplatform Parallel Approach for Lattice Sieving Algorithms
Lattice sieving is currently the leading class of algorithms for solving the shortest vector problem over lattices. The computational difficulty of this problem is the basis for constructing secure post-quantum public-key cryptosystems based on lattices. In this paper, we present a novel massively parallel approach for solving the shortest vector problem using lattice sieving and hardware acceleration. We combine previously reported algorithms with a proper caching strategy and develop hardware architecture. The main advantage of the proposed approach is eliminating the overhead of the data transfer between a CPU and a hardware accelerator. The authors believe that this is the first such architecture reported in the literature to date and predict to achieve up to 8 times higher throughput when compared to a multi-core high-performance CPU. Presented methods can be adapted for other sieving algorithms hard to implement in FPGAs due to the communication and memory bottleneck.  more » « less
Award ID(s):
1801512
PAR ID:
10281378
Author(s) / Creator(s):
;
Editor(s):
Qiu, Meikang
Date Published:
Journal Name:
International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2020
Volume:
LNCS 12452
Page Range / eLocation ID:
661 - 680
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hazay, Carmit; Stam, Martijn (Ed.)
    We study the computational problem of finding a shortest non-zero vector in a rotation of ℤ𝑛 , which we call ℤ SVP. It has been a long-standing open problem to determine if a polynomial-time algorithm for ℤ SVP exists, and there is by now a beautiful line of work showing how to solve it efficiently in certain very special cases. However, despite all of this work, the fastest known algorithm that is proven to solve ℤ SVP is still simply the fastest known algorithm for solving SVP (i.e., the problem of finding shortest non-zero vectors in arbitrary lattices), which runs in 2𝑛+𝑜(𝑛) time. We therefore set aside the (perhaps impossible) goal of finding an efficient algorithm for ℤ SVP and instead ask what else we can say about the problem. E.g., can we find any non-trivial speedup over the best known SVP algorithm? And, if ℤ SVP actually is hard, then what consequences would follow? Our results are as follows. We show that ℤ SVP is in a certain sense strictly easier than SVP on arbitrary lattices. In particular, we show how to reduce ℤ SVP to an approximate version of SVP in the same dimension (in fact, even to approximate unique SVP, for any constant approximation factor). Such a reduction seems very unlikely to work for SVP itself, so we view this as a qualitative separation of ℤ SVP from SVP. As a consequence of this reduction, we obtain a 2𝑛/2+𝑜(𝑛) -time algorithm for ℤ SVP, i.e., the first non-trivial speedup over the best known algorithm for SVP on general lattices. (In fact, this reduction works for a more general class of lattices—semi-stable lattices with not-too-large 𝜆1 .) We show a simple public-key encryption scheme that is secure if (an appropriate variant of) ℤ SVP is actually hard. Specifically, our scheme is secure if it is difficult to distinguish (in the worst case) a rotation of ℤ𝑛 from either a lattice with all non-zero vectors longer than 𝑛/log𝑛‾‾‾‾‾‾‾√ or a lattice with smoothing parameter significantly smaller than the smoothing parameter of ℤ𝑛 . The latter result has an interesting qualitative connection with reverse Minkowski theorems, which in some sense say that “ℤ𝑛 has the largest smoothing parameter.” We show a distribution of bases 𝐁 for rotations of ℤ𝑛 such that, if ℤ SVP is hard for any input basis, then ℤ SVP is hard on input 𝐁 . This gives a satisfying theoretical resolution to the problem of sampling hard bases for ℤ𝑛 , which was studied by Blanks and Miller [9]. This worst-case to average-case reduction is also crucially used in the analysis of our encryption scheme. (In recent independent work that appeared as a preprint before this work, Ducas and van Woerden showed essentially the same thing for general lattices [15], and they also used this to analyze the security of a public-key encryption scheme. Similar ideas also appeared in [5, 11, 20] in different contexts.) We perform experiments to determine how practical basis reduction performs on bases of ℤ𝑛 that are generated in different ways and how heuristic sieving algorithms perform on ℤ𝑛 . Our basis reduction experiments complement and add to those performed by Blanks and Miller, as we work with a larger class of algorithms (i.e., larger block sizes) and study the “provably hard” distribution of bases described above. Our sieving experiments confirm that heuristic sieving algorithms perform as expected on ℤ𝑛 . 
    more » « less
  2. Lattice-based algorithms in cryptanalysis often search for a target vector satisfying integer linear constraints as a shortest or closest vector in some lattice. In this work, we observe that these formulations may discard non-linear information from the underlying application that can be used to distinguish the target vector even when it is far from being uniquely close or short. We formalize lattice problems augmented with a predicate distinguishing a target vector and give algorithms for solving instances of these problems. We apply our techniques to lattice-based approaches for solving the Hidden Number Problem, a popular technique for recovering secret DSA or ECDSA keys in side-channel attacks, and demonstrate that our algorithms succeed in recovering the signing key for instances that were previously believed to be unsolvable using lattice approaches. We carried out extensive experiments using our estimation and solving framework, which we also make available with this work. 
    more » « less
  3. Stochastic Gradient Descent (SGD) is a valuable algorithm for large-scale machine learning, but has proven difficult to parallelize on conventional architectures because of communication and memory access issues. The HogWild series of mixed logically distributed and physically multi-threaded algorithms overcomes these issues for problems with sparse characteristics by using multiple local model vectors with asynchronous atomic updates. While this approach has proven effective for several reported examples, there are others, especially very sparse cases, that do not scale as well. This paper discusses an SGD Support Vector Machine (SVM) on a cacheless migrating thread architecture using the Hogwild algorithms as a framework. Our implementations on this novel architecture achieved superior hardware efficiency and scalability over that of a conventional cluster using MPI. Furthermore these improvements were gained using naive data partitioning techniques and hardware with substantially less compute capability than that present in conventional systems. 
    more » « less
  4. We consider a variant of the vehicle routing problem (VRP) where each customer has a unit demand and the goal is to minimize the total cost of routing a fleet of capacitated vehicles from one or multiple depots to visit all customers. We propose two parallel algorithms to efficiently solve the column-generation-based linear-programming relaxation for this VRP. Specifically, we focus on algorithms for the “pricing problem,” which corresponds to the resource-constrained elementary shortest path problem. The first algorithm extends the pulse algorithm for which we derive a new bounding scheme on the maximum load of any route. The second algorithm is based on random coloring from parameterized complexity which can be also combined with other techniques in the literature for improving VRPs, including cutting planes and column enumeration. We conduct numerical studies using VRP benchmarks (with 50–957 nodes) and instances of a medical home care delivery problem using census data in Wayne County, Michigan. Using parallel computing, both pulse and random coloring can significantly improve column generation for solving the linear programming relaxations and we can obtain heuristic integer solutions with small optimality gaps. Combining random coloring with column enumeration, we can obtain improved integer solutions having less than 2% optimality gaps for most VRP benchmark instances and less than 1% optimality gaps for the medical home care delivery instances, both under a 30-minute computational time limit. The use of cutting planes (e.g., robust cuts) can further reduce optimality gaps on some hard instances, without much increase in the run time. Summary of Contribution: The vehicle routing problem (VRP) is a fundamental combinatorial problem, and its variants have been studied extensively in the literature of operations research and computer science. In this paper, we consider general-purpose algorithms for solving VRPs, including the column-generation approach for the linear programming relaxations of the integer programs of VRPs and the column-enumeration approach for seeking improved integer solutions. We revise the pulse algorithm and also propose a random-coloring algorithm that can be used for solving the elementary shortest path problem that formulates the pricing problem in the column-generation approach. We show that the parallel implementation of both algorithms can significantly improve the performance of column generation and the random coloring algorithm can improve the solution time and quality of the VRP integer solutions produced by the column-enumeration approach. We focus on algorithmic design for VRPs and conduct extensive computational tests to demonstrate the performance of various approaches. 
    more » « less
  5. The multi objective shortest path (MOSP) problem, crucial in various practical domains, seeks paths that optimize multiple objectives. Due to its high computational complexity, numerous parallel heuristics have been developed for static networks. However, real-world networks are often dynamic where the network topology changes with time. Efficiently updating the shortest path in such networks is challenging, and existing algorithms for static graphs are inadequate for these dynamic conditions, necessitating novel approaches. Here, we first develop a parallel algorithm to efficiently update a single objective shortest path (SOSP) in fully dynamic networks, capable of accommodating both edge insertions and deletions. Building on this, we propose DynaMOSP, a parallel heuristic for Dynamic Multi Objective Shortest Path searches in large, fully dynamic networks. We provide a theoretical analysis of the conditions to achieve Pareto optimality. Furthermore, we devise a dedicated shared memory CPU implementation along with a version for heterogeneous computing environments. Empirical analysis on eight real-world graphs demonstrates that our method scales effectively. The shared memory CPU implementation achieves an average speedup of 12.74× and a maximum of 57.22×, while on an Nvidia GPU, it attains an average speedup of 69.19×, reaching up to 105.39× when compared to state-of-the-art techniques. 
    more » « less