skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 13, 2025

Title: Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits
There has been significant recent interest in \emph{graph-based nearest neighbor search methods}, many of which are centered on the construction of (approximately) {\em navigable}\/ graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to the given distance function. The complete graph is obviously navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $$O(\sqrt{n \log n })$$ for any set of $$n$$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $$O(\log n)$$ dimensions, a random point set has no navigable graph with average degree $$O(n^{\alpha})$$ for any $$\alpha < 1/2$$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.  more » « less
Award ID(s):
2106888
PAR ID:
10577177
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Advances in Neural Information Processing Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the (1+ϵ)-approximate nearest neighbor search problem: given a set X of n points in a d-dimensional space, build a data structure that, given any query point y, finds a point x∈X whose distance to y is at most (1+ϵ)minx∈X ‖x−y‖ for an accuracy parameter ϵ∈(0,1). Our main result is a data structure that occupies only O(ϵ^−2 n log(n)log(1/ϵ)) bits of space, assuming all point coordinates are integers in the range {−n^O(1)…n^O(1)}, i.e., the coordinates have O(logn) bits of precision. This improves over the best previously known space bound of O(ϵ^−2 n log(n)^2), obtained via the randomized dimensionality reduction method of Johnson and Lindenstrauss (1984). We also consider the more general problem of estimating all distances from a collection of query points to all data points X, and provide almost tight upper and lower bounds for the space complexity of this problem. 
    more » « less
  2. We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-standing O(log n)-round “barrier” achieved by Luby’s algorithm, but these yield o(log n)-round complexity only when the maximum degree  is somewhat small relative to n. More importantly, these results apply only in the Local model. In fact, the best known time bound in the Congest model is still O(log n) (via Luby’s algorithm) even for moderately small  (i.e., for  = (log n) and  = o(n)). Furthermore, message complexity has been largely ignored in the context of local symmetry breaking. Luby’s algorithm takes O(m) messages on m-edge graphs and this is the best known bound with respect to messages. Our work is motivated by the following central question: can we break the (log n) time complexity barrier and the (m) message complexity barrier in the Congest model for MIS or closelyrelated symmetry breaking problems? This paper presents progress towards this question for the distributed ruling set problem in the Congest model. A -ruling set is an independent set such that every node in the graph is at most hops from a node in the independent set. We present the following results: Time Complexity: We show that we can break the O(log n) “barrier” for 2- and 3-ruling sets. We compute 3-ruling sets in O  log n log log n  rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in O  log · (log n)1/2+" + log n log log n  rounds for any " > 0, which is o(log n) for a wide range of  values (e.g.,  = 2(log n)1/2−" ). These are the first 2- and 3-ruling set algorithms to improve over the O(log n)-round complexity of Luby’s algorithm in the Congest model. Message Complexity: We show an (n2) lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only O(n log2 n) messages and runs in O( log n) rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in n (which is optimal up to a polylogarithmic factor). 
    more » « less
  3. We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-standing O(log n)-round "barrier" achieved by Luby's algorithm, but these yield o(log n)-round complexity only when the maximum degree Delta is somewhat small relative to n. More importantly, these results apply only in the Local model. In fact, the best known time bound in the Congest model is still O(log n) (via Luby's algorithm) even for moderately small Delta (i.e., for Delta = Omega(log n) and Delta = o(n)). Furthermore, message complexity has been largely ignored in the context of local symmetry breaking. Luby's algorithm takes O(m) messages on m-edge graphs and this is the best known bound with respect to messages. Our work is motivated by the following central question: can we break the Theta(log n) time complexity barrier and the Theta(m) message complexity barrier in the Congest model for MIS or closely-related symmetry breaking problems? This paper presents progress towards this question for the distributed ruling set problem in the Congest model. A beta-ruling set is an independent set such that every node in the graph is at most beta hops from a node in the independent set. We present the following results: - Time Complexity: We show that we can break the O(log n) "barrier" for 2- and 3-ruling sets. We compute 3-ruling sets in O(log n/log log n) rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in O(log Delta (log n)^(1/2 + epsilon) + log n/log log n) rounds for any epsilon > 0, which is o(log n) for a wide range of Delta values (e.g., Delta = 2^(log n)^(1/2-epsilon)). These are the first 2- and 3-ruling set algorithms to improve over the O(log n)-round complexity of Luby's algorithm in the Congest model. - Message Complexity: We show an Omega(n^2) lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only O(n log^2 n) messages and runs in O(Delta log n) rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in n (which is optimal up to a polylogarithmic factor). 
    more » « less
  4. We study the following question raised by Erdős and Hajnal in the early 90’s. Over all n n -vertex graphs G G what is the smallest possible value of m m for which any m m vertices of G G contain both a clique and an independent set of size log ⁡ n \log n ? We construct examples showing that m m is at most 2 2 ( log ⁡ log ⁡ n ) 1 / 2 + o ( 1 ) 2^{2^{(\log \log n)^{1/2+o(1)}}} obtaining a twofold sub-polynomial improvement over the upper bound of about n \sqrt {n} coming from the natural guess, the random graph. Our (probabilistic) construction gives rise to new examples of Ramsey graphs, which while having no very large homogenous subsets contain both cliques and independent sets of size log ⁡ n \log n in any small subset of vertices. This is very far from being true in random graphs. Our proofs are based on an interplay between taking lexicographic products and using randomness. 
    more » « less
  5. We consider the ad-hoc networks consisting of n wireless nodes that are located on the plane. Any two given nodes are called neighbors if they are located within a certain distance (communication range) from one another. A given node can be directly connected to any one of its neighbors, and picks its connections according to a unique topology control algorithm that is available at every node. Given that each node knows only the indices (unique identification numbers) of its one and two-hop neighbors, we identify an algorithm that preserves connectivity and can operate without the need of any synchronization among nodes. Moreover, the algorithm results in a sparse graph with at most 5n edges and a maximum node degree of 10. Existing algorithms with the same promises further require neighbor distance and/or direction information at each node. We also evaluate the performance of our algorithm for random networks. In this case, our algorithm provides an asymptotically connected network with n(1+o(1)) edges with a degree less than or equal to 6 for 1-o(1) fraction of the nodes. We also introduce another asynchronous connectivity-preserving algorithm that can provide an upper bound as well as a lower bound on node degrees. 
    more » « less