Given a set $$P$$ of $$n$$ points in the plane, we consider the problem of computing the number of points of $$P$$ in a query unit disk (i.e., all query disks have the same radius). We show that the main techniques for simplex range searching in the plane can be adapted to this problem. For example, by adapting Matoušek's results, we can build a data structure of $O(n)$ space in $$O(n^{1+\delta})$$ time (for any $$\delta>0$$) so that each query can be answered in $$O(\sqrt{n})$$ time; alternatively, we can build a data structure of $$O(n^2/\log^2 n)$$ space with $$O(n^{1+\delta})$$ preprocessing time (for any $$\delta>0$$) and $$O(\log n)$$ query time. Our techniques lead to improvements for several other classical problems in computational geometry. 1. Given a set of $$n$$ unit disks and a set of $$n$$ points in the plane, the batched unit-disk range counting problem is to compute for each disk the number of points in it. Previous work [Katz and Sharir, 1997] solved the problem in $$O(n^{4/3}\log n)$$ time. We give a new algorithm of $$O(n^{4/3})$$ time, which is optimal as it matches an $$\Omega(n^{4/3})$$-time lower bound. For small $$\chi$$, where $$\chi$$ is the number of pairs of unit disks that intersect, we further improve the algorithm to $$O(n^{2/3}\chi^{1/3}+n^{1+\delta})$$ time, for any $$\delta>0$$. 2. The above result immediately leads to an $$O(n^{4/3})$$ time optimal algorithm for counting the intersecting pairs of circles for a set of $$n$$ unit circles in the plane. The previous best algorithms solve the problem in $$O(n^{4/3}\log n)$$ deterministic time [Katz and Sharir, 1997] or in $$O(n^{4/3}\log^{2/3} n)$$ expected time by a randomized algorithm [Agarwal, Pellegrini, and Sharir, 1993]. 3. Given a set $$P$$ of $$n$$ points in the plane and an integer $$k$$, the distance selection problem is to find the $$k$$-th smallest distance among all pairwise distances of $$P$$. The problem can be solved in $$O(n^{4/3}\log^2 n)$$ deterministic time [Katz and Sharir, 1997] or in $$O(n\log n+n^{2/3}k^{1/3}\log^{5/3}n)$$ expected time by a randomized algorithm [Chan, 2001]. Our new randomized algorithm runs in $$O(n\log n +n^{2/3}k^{1/3}\log n)$$ expected time. 4. Given a set $$P$$ of $$n$$ points in the plane, the discrete $$2$$-center problem is to compute two smallest congruent disks whose centers are in $$P$$ and whose union covers $$P$$. An $$O(n^{4/3}\log^5 n)$$-time algorithm was known [Agarwal, Sharir, and Welzl, 1998]. Our techniques yield a deterministic algorithm of $$O(n^{4/3}\log^{10/3} n\cdot (\log\log n)^{O(1)})$$ time and a randomized algorithm of $$O(n^{4/3}\log^3 n\cdot (\log\log n)^{1/3})$$ expected time.
more »
« less
Efficient Distributed Algorithms for the K-Nearest Neighbors Problem
The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point q, we want to assign a label to q based on the labels of the K-nearest points to the query. We study this problem in the k-machine model, a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to compute an answer given a query point to a machine using a small number of communication rounds. Our main result is a randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high success probability (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(k log K) messages. Our bounds are essentially the best possible for comparison-based algorithms. We also implemented our algorithm and show that it performs well in practice.
more »
« less
- Award ID(s):
- 1633720
- PAR ID:
- 10197167
- Date Published:
- Journal Name:
- SPAA '20: Proceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures
- Page Range / eLocation ID:
- 527 to 529
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Pankratov, Denis (Ed.)Given a set $$P$$ of $$n$$ points in the plane, and a parameter $$k$$, we present an algorithm, whose running time is $$O(n^{3/2} \sqrt{k}\log^{3/2} n + kn\log^2 n)$$, with high probability, that computes a subset $$Q* \subseteq P$$ of $$k$$ points, that minimizes the Hausdorff distance between the convex-hulls of $Q*$ and $$P$$. This is the first subquadratic algorithm for this problem if $$k$$ is small.more » « less
-
Alistarh, Dan (Ed.)The SetCover problem has been extensively studied in many different models of computation, including parallel and distributed settings. From an approximation point of view, there are two standard guarantees: an O(log Δ)-approximation (where Δ is the maximum set size) and an O(f)-approximation (where f is the maximum number of sets containing any given element). In this paper, we introduce a new, surprisingly simple, model-independent approach to solving SetCover in unweighted graphs. We obtain multiple improved algorithms in the MPC and CRCW PRAM models. First, in the MPC model with sublinear space per machine, our algorithms can compute an O(f) approximation to SetCover in Ô(√{log Δ} + log f) rounds and a O(log Δ) approximation in O(log^{3/2} n) rounds. Moreover, in the PRAM model, we give a O(f) approximate algorithm using linear work and O(log n) depth. All these bounds improve the existing round complexity/depth bounds by a log^{Ω(1)} n factor. Moreover, our approach leads to many other new algorithms, including improved algorithms for the HypergraphMatching problem in the MPC model, as well as simpler SetCover algorithms that match the existing bounds.more » « less
-
Given a collection of vectors, the approximate K-nearest-neighbor graph (KGraph for short) connects every vector to its approximate K-nearest-neighbors (KNN for short). KGraph plays an important role in high dimensional data visualization, semantic search, manifold learning, and machine learning. The vectors are typically vector representations of real-world objects (e.g., images and documents), which often come with a few structured attributes, such as times-tamps and locations. In this paper, we study the all-range approximate K-nearest-neighbor graph (ARKGraph) problem. Specifically, given a collection of vectors, each associated with a numerical search key (e.g., a timestamp), we aim to build an index that takes a search key range as the query and returns the KGraph of vectors whose search keys are within the query range. ARKGraph can facilitate interactive high dimensional data visualization, data mining, etc. A key challenge of this problem is the huge index size. This is because, given n vectors, a brute-force index stores a KGraph for every search key range, which results in O (K n 3 ) index size as there are O ( n 2 ) search key ranges and each KGraph takes O (K n ) space. We observe that the KNN of a vector in nearby ranges are often the same, which can be grouped together to save space. Based on this observation, we propose a series of novel techniques that reduce the index size significantly to just O (K n log n ) in the average case. Furthermore, we develop an efficient indexing algorithm that constructs the optimized ARKGraph index directly without exhaustively calculating the distance between every pair of vectors. To process a query, for each vector in the query range, we only need O (log log n + K log K) to restore its KNN in the query range from the optimized ARKGraph index. We conducted extensive experiments on real-world datasets. Experimental results show that our optimized ARKGraph index achieved a small index size, low query latency, and good scalability. Specifically, our approach was 1000x faster than the baseline method that builds a KGraph for all the vectors in the query range on-the-fly.more » « less
-
null (Ed.)Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds.more » « less
An official website of the United States government

