skip to main content


Title: Embeddings and labeling schemes for A*
A* is a classic and popular method for graphs search and path finding. It assumes the existence of a heuristic function h(u,t) that estimates the shortest distance from any input node u to the destination t. Traditionally, heuristics have been handcrafted by domain experts. However, over the last few years, there has been a growing interest in learning heuristic functions. Such learned heuristics estimate the distance between given nodes based on "features" of those nodes. In this paper we formalize and initiate the study of such feature-based heuristics. In particular, we consider heuristics induced by norm embeddings and distance labeling schemes, and provide lower bounds for the tradeoffs between the number of dimensions or bits used to represent each graph node, and the running time of the A* algorithm. We also show that, under natural assumptions, our lower bounds are almost optimal.  more » « less
Award ID(s):
2022448
NSF-PAR ID:
10341844
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Innovations in Theoretical Computer Science (ITCS)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gortz, Inge Li ; Farach-Colton, Martin ; Puglisi, Simon J. ; Herman, Grzegorz (Ed.)
    Computing the diameter of a graph, i.e. the largest distance, is a fundamental problem that is central in fine-grained complexity. In undirected graphs, the Strong Exponential Time Hypothesis (SETH) yields a lower bound on the time vs. approximation trade-off that is quite close to the upper bounds. In directed graphs, however, where only some of the upper bounds apply, much larger gaps remain. Since d(u,v) may not be the same as d(v,u), there are multiple ways to define the problem, the two most natural being the (one-way) diameter (max_(u,v) d(u,v)) and the roundtrip diameter (max_{u,v} d(u,v)+d(v,u)). In this paper we make progress on the outstanding open question for each of them. - We design the first algorithm for diameter in sparse directed graphs to achieve n^{1.5-ε} time with an approximation factor better than 2. The new upper bound trade-off makes the directed case appear more similar to the undirected case. Notably, this is the first algorithm for diameter in sparse graphs that benefits from fast matrix multiplication. - We design new hardness reductions separating roundtrip diameter from directed and undirected diameter. In particular, a 1.5-approximation in subquadratic time would refute the All-Nodes k-Cycle hypothesis, and any (2-ε)-approximation would imply a breakthrough algorithm for approximate 𝓁_∞-Closest-Pair. Notably, these are the first conditional lower bounds for diameter that are not based on SETH. 
    more » « less
  2. null (Ed.)
    We study the communication cost (or message complexity) of fundamental distributed symmetry breaking problems, namely, coloring and MIS. While significant progress has been made in understanding and improving the running time of such problems, much less is known about the message complexity of these problems. In fact, all known algorithms need at least Ω(m) communication for these problems, where m is the number of edges in the graph. We addressthe following question in this paper: can we solve problems such as coloring and MIS using sublinear, i.e., o(m) communication, and if sounder what conditions? In a classical result, Awerbuch, Goldreich, Peleg, and Vainish [JACM 1990] showed that fundamental global problems such asbroadcast and spanning tree construction require at least o(m) messages in the KT-1 Congest model (i.e., Congest model in which nodes have initial knowledge of the neighbors' ID's) when algorithms are restricted to be comparison-based (i.e., algorithms inwhich node ID's can only be compared). Thirty five years after this result, King, Kutten, and Thorup [PODC 2015] showed that onecan solve the above problems using Õ(n) messages (n is the number of nodes in the graph) in Õ(n) rounds in the KT-1 Congest model if non-comparison-based algorithms are permitted. An important implication of this result is that one can use the synchronous nature of the KT-1 Congest model, using silence to convey information,and solve any graph problem using non-comparison-based algorithms with Õ(n) messages, but this takes an exponential number of rounds. In the asynchronous model, even this is not possible. In contrast, much less is known about the message complexity of local symmetry breaking problems such as coloring and MIS. Our paper fills this gap by presenting the following results. Lower bounds: In the KT-1 CONGEST model, we show that any comparison-based algorithm, even a randomized Monte Carlo algorithm with constant success probability, requires Ω(n 2) messages in the worst case to solve either (△ + 1)-coloring or MIS, regardless of the number of rounds. We also show that Ω(n) is a lower bound on the number ofmessages for any (△ + 1)-coloring or MIS algorithm, even non-comparison-based, and even with nodes having initial knowledge of up to a constant radius. Upper bounds: In the KT-1 CONGEST model, we present the following randomized non-comparison-based algorithms for coloring that, with high probability, use o(m) messages and run in polynomially many rounds.(a) A (△ + 1)-coloring algorithm that uses Õ(n1.5) messages, while running in Õ(D + √ n) rounds, where D is the graph diameter. Our result also implies an asynchronous algorithm for (△ + 1)-coloring with the same message bound but running in Õ(n) rounds. (b) For any constantε > 0, a (1+ε)△-coloring algorithm that uses Õ(n/ε 2 ) messages, while running in Õ(n) rounds. If we increase our input knowledge slightly to radius 2, i.e.,in the KT-2 CONGEST model, we obtain:(c) A randomized comparison-based MIS algorithm that uses Õ(n 1.5) messages. while running in Õ( √n) rounds. While our lower bound results can be viewed as counterparts to the classical result of Awerbuch, Goldreich, Peleg, and Vainish [JACM 90], but for local problems, our algorithms are the first-known algorithms for coloring and MIS that take o(m) messages and run in polynomially many rounds. 
    more » « less
  3. Selecting appropriate inputs for systems described by complex networks is an important but difficult problem that largely remains open in the field of control of networks. Recent work has proposed two methods for energy efficient input selection; a gradient-based heuristic and a greedy approximation algorithm. We propose here an alternative method for input selection based on the analytic solution of the controllability Gramian of the ‘balloon graph’, a special model graph that captures the role of both distance and redundant paths between a driver node and a target node. The method presented is especially applicable for large networks where one is interested in controlling only a small number of outputs, or target nodes, for which current methods may not be practical because they require computing a typically very ill-conditioned matrix, called the controllability Gramian. Our method produces comparable results to the previous methods while being more computational efficient. 
    more » « less
  4. We propose an interdependent random geometric graph (RGG) model for interdependent networks. Based on this model, we study the robustness of two interdependent spatially embedded networks where interdependence exists between geographically nearby nodes in the two networks. We study the emergence of the giant mutual component in two interdependent RGGs as node densities increase, and define the percolation threshold as a pair of node densities above which the giant mutual component first appears. In contrast to the case for a single RGG, where the percolation threshold is a unique scalar for a given connection distance, for two interdependent RGGs, multiple pairs of percolation thresholds may exist, given that a smaller node density in one RGG may increase the minimum node density in the other RGG in order for a giant mutual component to exist. We derive analytical upper bounds on the percolation thresholds of two interdependent RGGs by discretization, and obtain 99% confidence intervals for the percolation thresholds by simulation. Based on these results, we derive conditions for the interdependent RGGs to be robust under random failures and geographical attacks. 
    more » « less
  5. Machine learning with missing data has been approached in two different ways, including feature imputation where missing feature values are estimated based on observed values and label prediction where downstream labels are learned directly from incomplete data. However, existing imputation models tend to have strong prior assumptions and cannot learn from downstream tasks, while models targeting label prediction often involve heuristics and can encounter scalability issues. Here we propose GRAPE, a graph-based framework for feature imputation as well as label prediction. GRAPE tackles the missing data problem using a graph representation, where the observations and features are viewed as two types of nodes in a bipartite graph, and the observed feature values as edges. Under the GRAPE framework, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task. These tasks are then solved with Graph Neural Networks. Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks, compared with existing state-of-the-art methods. 
    more » « less