Dynamic trees are a wellstudied and fundamental building block of dynamic graph algorithms dating back to the seminal work of Sleator and Tarjan [STOC'81, (1981), pp. 114122]. The problem is to maintain a tree subject to online edge insertions and deletions while answering queries about the tree, such as the heaviest weight on a path, etc. In the parallel batchdynamic setting, the goal is to process batches of edge updates work efficiently in low (polylog n) span. Two workefficient algorithms are known: batchparallel Euler Tour Trees by Tseng et al. [ALENEX'19, (2019), pp. 92106] and parallel RakeCompress (RC) Trees by Acar et al. [ESA'20, (2020), pp. 2:12:23]. Both however are randomized and work efficient in expectation. Several downstream results that use these data structures (and indeed to the best of our knowledge, all known workefficient parallel batchdynamic graph algorithms) are therefore also randomized.
In this work, we give the first deterministic workefficient solution to the problem. Our algorithm maintains a parallel RCTree on n vertices subject to batches of k edge updates deterministically in worstcase O(k log(1 + n/k)) work and O(log n loglog k) span on the CommonCRCW PRAM. We also show how to improve the span of the randomized algorithm from O(log n log* n) to O(log n).
Lastly, as a result of our new deterministic algorithm, we also derandomize several downstream results that make use of parallel batchdynamic dynamic trees, previously for which the only efficient solutions were randomized.
more »
« less
Parallel Nearest Neighbors in Low Dimensions with Batch Updates
We present a set of parallel algorithms for computing exact knearest neighbors in low dimensions. Many knearest neighbor algorithms use either a kdtree or the Morton ordering of the point set; our algorithms combine these approaches using a data structure we call the zdtree. We show that this combination is both theoretically efficient under common assumptions, and fast in practice. For point sets of size n with bounded expansion constant and bounded ratio, the zdtree can be built in O(n) work with O(n^ε) span for constant ε < 1, and searching for the knearest neighbors of a point takes expected O(k log k) time. We benchmark our knearest neighbor algorithms against existing parallel knearest neighbor algorithms, showing that our implementations are generally faster than the state of the art as well as achieving 75x speedup on 144 hyperthreads. Furthermore, the zdtree supports parallel batchdynamic insertions and deletions; to our knowledge, it is the first knearest neighbor data structure to support such updates. On point sets with bounded expansion constant and bounded ratio, a batchdynamic update of size k requires O(k log n/k) work with O(k^ε + polylog(n)) span.
more »
« less
 NSFPAR ID:
 10416273
 Publisher / Repository:
 SIAM
 Date Published:
 Journal Name:
 2022 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX)
 Page Range / eLocation ID:
 195  208
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We present new results on a number of fundamental problems about dynamic geometric data structures: 1) We describe the first fully dynamic data structures with sublinear amortized update time for maintaining (i) the number of vertices or the volume of the convex hull of a 3D point set, (ii) the largest empty circle for a 2D point set, (iii) the Hausdorff distance between two 2D point sets, (iv) the discrete 1center of a 2D point set, (v) the number of maximal (i.e., skyline) points in a 3D point set. The update times are near n^{11/12} for (i) and (ii), n^{7/8} for (iii) and (iv), and n^{2/3} for (v). Previously, sublinear bounds were known only for restricted "semionline" settings [Chan, SODA 2002]. 2) We slightly improve previous fully dynamic data structures for answering extreme point queries for the convex hull of a 3D point set and nearest neighbor search for a 2D point set. The query time is O(log^2n), and the amortized update time is O(log^4n) instead of O(log^5n) [Chan, SODA 2006; Kaplan et al., SODA 2017]. 3) We also improve previous fully dynamic data structures for maintaining the bichromatic closest pair between two 2D point sets and the diameter of a 2D point set. The amortized update time is O(log^4n) instead of O(log^7n) [Eppstein 1995; Chan, SODA 2006; Kaplan et al., SODA 2017].more » « less

Algorithms often carry out equally many computations for “easy” and “hard” problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate knearestneighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of k nearest neighbors of a given query point. We pro pose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method.more » « less

Over the last two decades, frameworks for distributedmemory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the defacto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting kcliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space. In addition to our theoretical results, we simulate our triangle counting algorithms in realworld graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.more » « less

The Knearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the Knearest points to the query. We study this problem in the kmachine model, a model for distributed largescale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the kmachine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparisonbased algorithms. We also implemented our algorithm and show that it performs well in practice.more » « less