skip to main content


Title: Improved Reconstruction of Random Geometric Graphs
Embedding graphs in a geographical or latent space, i.e. inferring locations for vertices in Euclidean space or on a smooth manifold or submanifold, is a common task in network analysis, statistical inference, and graph visualization. We consider the classic model of random geometric graphs where n points are scattered uniformly in a square of area n, and two points have an edge between them if and only if their Euclidean distance is less than r. The reconstruction problem then consists of inferring the vertex positions, up to the symmetries of the square, given only the adjacency matrix of the resulting graph. We give an algorithm that, if r = n^α for α > 0, with high probability reconstructs the vertex positions with a maximum error of O(n^β) where β = 1/2-(4/3)α, until α ≥ 3/8 where β = 0 and the error becomes O(√{log n}). This improves over earlier results, which were unable to reconstruct with error less than r. Our method estimates Euclidean distances using a hybrid of graph distances and short-range estimates based on the number of common neighbors. We extend our results to the surface of the sphere in ℝ³ and to hypercubes in any constant dimension.  more » « less
Award ID(s):
1838251
NSF-PAR ID:
10359467
Author(s) / Creator(s):
; ; ;
Editor(s):
Bojanczyk, M. et
Date Published:
Journal Name:
49th International Colloquium on Automata, Languages, and Programming (ICALP 2022)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of designing sublinear time algorithms for estimating the cost of minimum] metric traveling salesman (TSP) tour. Specifically, given access to a n × n distance matrix D that specifies pairwise distances between n points, the goal is to estimate the TSP cost by performing only sublinear (in the size of D) queries. For the closely related problem of estimating the weight of a metric minimum spanning tree (MST), it is known that for any epsilon > 0, there exists an O^~(n/epsilon^O(1))-time algorithm that returns a (1+epsilon)-approximate estimate of the MST cost. This result immediately implies an O^~(n/epsilon^O(1)) time algorithm to estimate the TSP cost to within a (2 + epsilon) factor for any epsilon > 0. However, no o(n^2)-time algorithms are known to approximate metric TSP to a factor that is strictly better than 2. On the other hand, there were also no known barriers that rule out existence of (1 + epsilon)-approximate estimation algorithms for metric TSP with O^~ (n) time for any fixed epsilon > 0. In this paper, we make progress on both algorithms and lower bounds for estimating metric TSP cost. On the algorithmic side, we first consider the graphic TSP problem where the metric D corresponds to shortest path distances in a connected unweighted undirected graph. We show that there exists an O^~(n) time algorithm that estimates the cost of graphic TSP to within a factor of (2 − epsilon_0) for some epsilon_0 > 0. This is the first sublinear cost estimation algorithm for graphic TSP that achieves an approximation factor less than 2. We also consider another well-studied special case of metric TSP, namely, (1, 2)-TSP where all distances are either 1 or 2, and give an O^~(n ^ 1.5) time algorithm to estimate optimal cost to within a factor of 1.625. Our estimation algorithms for graphic TSP as well as for (1, 2)-TSP naturally lend themselves to O^~(n) space streaming algorithms that give an 11/6-approximation for graphic TSP and a 1.625-approximation for (1, 2)-TSP. These results motivate the natural question if analogously to metric MST, for any epsilon > 0, (1 + epsilon)-approximate estimates can be obtained for graphic TSP and (1, 2)-TSP using O^~ (n) queries. We answer this question in the negative – there exists an epsilon_0 > 0, such that any algorithm that estimates the cost of graphic TSP ((1, 2)-TSP) to within a (1 + epsilon_0)-factor, necessarily requires (n^2) queries. This lower bound result highlights a sharp separation between the metric MST and metric TSP problems. Similarly to many classical approximation algorithms for TSP, our sublinear time estimation algorithms utilize subroutines for estimating the size of a maximum matching in the underlying graph. We show that this is not merely an artifact of our approach, and that for any epsilon > 0, any algorithm that estimates the cost of graphic TSP or (1, 2)-TSP to within a (1 + epsilon)-factor, can also be used to estimate the size of a maximum matching in a bipartite graph to within an epsilon n additive error. This connection allows us to translate known lower bounds for matching size estimation in various models to similar lower bounds for metric TSP cost estimation. 
    more » « less
  2. A graph spanner is a fundamental graph structure that faithfully preserves the pairwise distances in the input graph up to a small multiplicative stretch. The common objective in the computation of spanners is to achieve the best-known existential size-stretch trade-off efficiently. Classical models and algorithmic analysis of graph spanners essentially assume that the algorithm can read the input graph, construct the desired spanner, and write the answer to the output tape. However, when considering massive graphs containing millions or even billions of nodes not only the input graph, but also the output spanner might be too large for a single processor to store. To tackle this challenge, we initiate the study of local computation algorithms (LCAs) for graph spanners in general graphs, where the algorithm should locally decide whether a given edge (u,v)∈E belongs to the output spanner. Such LCAs give the user the `illusion' that a specific sparse spanner for the graph is maintained, without ever fully computing it. We present the following results: -For general n-vertex graphs and r∈{2,3}, there exists an LCA for (2r−1)-spanners with O˜(n1+1/r) edges and sublinear probe complexity of O˜(n1−1/2r). These size/stretch tradeoffs are best possible (up to polylogarithmic factors). -For every k≥1 and n-vertex graph with maximum degree Δ, there exists an LCA for O(k2) spanners with O˜(n1+1/k) edges, probe complexity of O˜(Δ4n2/3), and random seed of size polylog(n). This improves upon, and extends the work of [Lenzen-Levi, 2018]. We also complement our results by providing a polynomial lower bound on the probe complexity of LCAs for graph spanners that holds even for the simpler task of computing a sparse connected subgraph with o(m) edges. 
    more » « less
  3. Given a set S of n points in the plane and a parameter ε>0, a Euclidean (1+ε) -spanner is a geometric graph G=(S,E) that contains a path of weight at most (1+ε)∥pq∥2 for all p,q∈S . We show that the minimum weight of a Euclidean (1+ε)-spanner for n points in the unit square [0,1]2 is O(ε−3/2n−−√), and this bound is the best possible. The upper bound is based on a new spanner algorithm that sparsifies Yao-graphs. It improves upon the baseline O(ε−2n−−√), obtained by combining a tight bound for the weight of an MST and a tight bound for the lightness of Euclidean (1+ε)-spanners, which is the ratio of the spanner weight to the weight of the MST. The result generalizes to d-space for all d∈N : The minimum weight of a Euclidean (1+ε)-spanner for n points in the unit cube [0,1]d is Od(ε(1−d2)/dn(d−1)/d), and this bound is the best possible. For the n×n section of the integer lattice, we show that the minimum weight of a Euclidean (1+ε)-spanner is between Ω(ε−3/4n2) and O(ε−1log(ε−1)n2). These bounds become Ω(ε−3/4n−−√) and O(ε−1log(ε−1)n−−√) when scaled to a grid of n points in [0,1]2. . 
    more » « less
  4. null (Ed.)
    This paper studies seeded graph matching for power-law graphs. Assume that two edge-correlated graphs are independently edge-sampled from a common parent graph with a power-law degree distribution. A set of correctly matched vertex-pairs is chosen at random and revealed as initial seeds. Our goal is to use the seeds to recover the remaining latent vertex correspondence between the two graphs. Departing from the existing approaches that focus on the use of high-degree seeds in $1$-hop neighborhoods, we develop an efficient algorithm that exploits the low-degree seeds in suitably-defined D-hop neighborhoods. Specifically, we first match a set of vertex-pairs with appropriate degrees (which we refer to as the first slice) based on the number of low-degree seeds in their D-hop neighborhoods. This approach significantly reduces the number of initial seeds needed to trigger a cascading process to match the rest of graphs. Under the Chung-Lu random graph model with n vertices, max degree Θ(√n), and the power-law exponent 2<β<3, we show that as soon as D> 4-β/3-β, by optimally choosing the first slice, with high probability our algorithm can correctly match a constant fraction of the true pairs without any error, provided with only Ω((log n)4-β) initial seeds. Our result achieves an exponential reduction in the seed size requirement, as the best previously known result requires n1/2+ε seeds (for any small constant ε>0). Performance evaluation with synthetic and real data further corroborates the improved performance of our algorithm. 
    more » « less
  5. Given a set P of n points in the plane, the unit-disk graph Gr(P) with respect to a parameter r is an undirected graph whose vertex set is P such that an edge connects two points p, q in P if the Euclidean distance between p and q is at most r (the weight of the edge is 1 in the unweighted case and is the distance between p and q in the weighted case). Given a value \lambda>0 and two points s and t of P, we consider the following reverse shortest path problem: computing the smallest r such that the shortest path length between s and t in Gr(P) is at most \lambda. In this paper, we present an algorithm of O(\lfloor \lambda \rfloor \cdot n log n) time and another algorithm of O(n^{5/4} log^{7/4} n) time for the unweighted case, as well as an O(n^{5/4} log^{5/2} n) time algorithm for the weighted case. We also consider the L1 version of the problem where the distance of two points is measured by the L1 metric; we solve the problem in O(n log^3 n) time for both the unweighted and weighted cases. 
    more » « less