We present a set of parallel algorithms for computing exact k-nearest neighbors in low dimensions. Many k-nearest neighbor algorithms use either a kd-tree or the Morton ordering of the point set; our algorithms combine these approaches using a data structure we call the zd-tree. We show that this combination is both theoretically efficient under common assumptions, and fast in practice. For point sets of size n with bounded expansion constant and bounded ratio, the zd-tree can be built in O(n) work with O(n^ε) span for constant ε < 1, and searching for the k-nearest neighbors of a point takes expected O(k log k) time. We benchmark our k-nearest neighbor algorithms against existing parallel k-nearest neighbor algorithms, showing that our implementations are generally faster than the state of the art as well as achieving 75x speedup on 144 hyperthreads. Furthermore, the zd-tree supports parallel batch-dynamic insertions and deletions; to our knowledge, it is the first k-nearest neighbor data structure to support such updates. On point sets with bounded expansion constant and bounded ratio, a batch-dynamic update of size k requires O(k log n/k) work with O(k^ε + polylog(n)) span.
more »
« less
Deterministic and Low-Span Work-Efficient Parallel Batch-Dynamic Trees
Dynamic trees are a well-studied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114-122]. The problem is to maintain a tree subject to online edge insertions and deletions while answering queries about the tree, such as the heaviest weight on a path, etc. In the parallel batch-dynamic setting, the goal is to process batches of edge updates work efficiently in low (polylog n) span. Two work-efficient algorithms are known: batch-parallel Euler Tour Trees by Tseng et al. [ALENEX'19, (2019), pp. 92--106] and parallel Rake-Compress (RC) Trees by Acar et al. [ESA'20, (2020), pp. 2:1--2:23]. Both however are randomized and work efficient in expectation. Several downstream results that use these data structures (and indeed to the best of our knowledge, all known work-efficient parallel batch-dynamic graph algorithms) are therefore also randomized. In this work, we give the first deterministic work-efficient solution to the problem. Our algorithm maintains a parallel RC-Tree on n vertices subject to batches of k edge updates deterministically in worst-case O(k log(1 + n/k)) work and O(log n loglog k) span on the Common-CRCW PRAM. We also show how to improve the span of the randomized algorithm from O(log n log* n) to O(log n). Lastly, as a result of our new deterministic algorithm, we also derandomize several downstream results that make use of parallel batch-dynamic dynamic trees, previously for which the only efficient solutions were randomized.
more »
« less
- PAR ID:
- 10539484
- Publisher / Repository:
- ACM Symposium on Parallelism in Algorithms and Architectures (SPAA)
- Date Published:
- ISBN:
- 9798400704161
- Page Range / eLocation ID:
- 247 to 258
- Format(s):
- Medium: X
- Location:
- Nantes France
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The kd-tree is one of the most widely used data structures to manage multi-dimensional data. Due to the ever-growing data volume, it is imperative to consider parallelism in kd-trees. However, we observed challenges in existing parallel kd-tree implementations, for both constructions and updates. The goal of this paper is to develop efficient in-memory kd-trees by supporting high parallelism and cache-efficiency. We propose the Pkd-tree (Parallel kd-tree), a parallel kd-tree that is efficient both in theory and in practice. The Pkd-tree supports parallel tree construction, batch update (insertion and deletion), and various queries including k-nearest neighbor search, range query, and range count. We proved that our algorithms have strong theoretical bounds in work (sequential time complexity), span (parallelism), and cache complexity. Our key techniques include 1) an efficient construction algorithm that optimizes work, span, and cache complexity simultaneously, and 2) reconstruction-based update algorithms that guarantee the tree to be weight-balanced. With the new algorithmic insights and careful engineering effort, we achieved a highly optimized implementation of the Pkd-tree. We tested Pkd-tree with various synthetic and real-world datasets, including both uniform and highly skewed data. We compare the Pkd-tree with state-of-the-art parallel kd-tree implementations. In all tests, with better or competitive query performance, Pkd-tree is much faster in construction and updates consistently than all baselines. We released our code.more » « less
-
We present a randomizedO(mlog2n) work,O(polylogn) depth parallel algorithm for minimum cut. This algorithm matches the work bounds of a recent sequential algorithm by Gawrychowski, Mozes, and Weimann [ICALP’20], and improves on the previously best parallel algorithm by Geissmann and Gianinazzi [SPAA’18], which performsO(mlog4n) work inO(polylogn) depth. Our algorithm makes use of three components that might be of independent interest. First, we design a parallel data structure that efficiently supports batched mixed queries and updates on trees. It generalizes and improves the work bounds of a previous data structure of Geissmann and Gianinazzi and is work efficient with respect to the best sequential algorithm. Second, we design a parallel algorithm for approximate minimum cut that improves on previous results by Karger and Motwani. We use this algorithm to give a work-efficient procedure to produce a tree packing, as in Karger’s sequential algorithm for minimum cuts. Last, we design an efficient parallel algorithm for solving the minimum 2-respecting cut problem.more » « less
-
Existing parallel algorithms for wavelet tree construction have a work complexity of $$O(n\log\sigma)$$. This paper presents parallel algorithms for the problem with improved work complexity. Our first algorithm is based on parallel integer sorting and has either $$O(n\log\log n\lceil\log\sigma/\sqrt{\log n\log\log n}\rceil)$$ work and polylogarithmic depth, or $$O(n\lceil\log\sigma/\sqrt{\log n}\rceil)$$ work and sub-linear depth. We also describe another algorithm that has $$O(n\lceil\log\sigma/\sqrt{\log n} \rceil)$$ work and $$O(\sigma+\log n)$$ depth. We then show how to use similar ideas to construct variants of wavelet trees (arbitrary-shaped binary trees and multiary trees) as well as wavelet matrices in parallel with lower work complexity than prior algorithms. Finally, we show that the rank and select structures on binary sequences and multiary sequences, which are stored on wavelet tree nodes, can be constructed in parallel with improved work bounds, matching those of the best existing sequential algorithms for constructing rank and select structures.more » « less
-
We present new algorithms for computing many faces in arrangements of lines and segments. Given a set $$S$$ of $$n$$ lines (resp., segments) and a set $$P$$ of $$m$$ points in the plane, the problem is to compute the faces of the arrangements of $$S$$ that contain at least one point of $$P$$. For the line case, we give a deterministic algorithm of $$O(m^{2/3}n^{2/3}\log^{2/3} (n/\sqrt{m})+(m+n)\log n)$$ time. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $$\log^{2.22}n$$ and improves the previously best randomized algorithm [Agarwal, Matoušek, and Schwarzkopf, 1998] by a factor of $$\log^{1/3}n$$ in certain cases (e.g., when $$m=\Theta(n)$$). For the segment case, we present a deterministic algorithm of $$O(n^{2/3}m^{2/3}\log n+\tau(n\alpha^2(n)+n\log m+m)\log n)$$ time, where $$\tau=\min\{\log m,\log (n/\sqrt{m})\}$$ and $$\alpha(n)$$ is the inverse Ackermann function. This improves the previously best deterministic algorithm [Agarwal, 1990] by a factor of $$\log^{2.11}n$$ and improves the previously best randomized algorithm [Agarwal, Matoušek, and Schwarzkopf, 1998] by a factor of $$\log n$$ in certain cases (e.g., when $$m=\Theta(n)$$). We also give a randomized algorithm of $$O(m^{2/3}K^{1/3}\log n+\tau(n\alpha(n)+n\log m+m)\log n\log K)$$ expected time, where $$K$$ is the number of intersections of all segments of $$S$$. In addition, we consider the query version of the problem, that is, preprocess $$S$$ to compute the face of the arrangement of $$S$$ that contains any given query point. We present new results that improve the previous work for both the line and the segment cases. In particulary, for the line case, we build a data structure of $$O(n\log n)$$ space in $$O(n\log n)$$ randomized time, so that the face containing the query point can be obtained in $$O(\sqrt{n\log n})$$ time with high probability (more specifically, the query returns a binary search tree representing the face so that standard binary-search-based queries on the face can be handled in $$O(\log n)$$ time each and the face itself can be output explicitly in time linear in its size).more » « less