skip to main content


Title: Solving All-Pairs Shortest-Paths Problem in Large Graphs Using Apache Spark
Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input graphs, necessitating parallel approaches. In this work, we propose, implement and thoroughly analyse different strategies for APSP on distributed memory clusters with Apache Spark. Our solvers are designed for large undirected weighted graphs, and differ in complexity and degree of reliance on techniques outside of pure Spark API. We demonstrate that the best performing solver is able to handle APSP problems with over 200,000 vertices on a 1024-core cluster. However, it requires auxiliary shared persistent storage to compensate for missing Spark functionality.  more » « less
Award ID(s):
1910539
NSF-PAR ID:
10145329
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 48th International Conference on Parallel Processing
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms---usually sublogarithmic-time and often poly(łogłog n)-time, or even faster---for a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on MPC graph algorithms, we present poly(łog k) ε poly(łogłog n) round MPC algorithms for computing O(k^1+o(1) )-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time MPC algorithms for spanner construction. As primary applications of our spanners, we get two important implications, as follows: -For the MPC setting, we get an O(łog^2łog n)-round algorithm for O(łog^1+o(1) n) approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time MPC algorithm for distance approximations. -Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sub-logarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model. 
    more » « less
  2. Expander graphs play a central role in graph theory and algorithms. With a number of powerful algorithmic tools developed around them, such as the Cut-Matching game, expander pruning, expander decomposition, and algorithms for decremental All-Pairs Shortest Paths (APSP) in expanders, to name just a few, the use of expanders in the design of graph algorithms has become ubiquitous. Specific applications of interest to us are fast deterministic algorithms for cut problems in static graphs, and algorithms for dynamic distance-based graph problems, such as APSP. Unfortunately, the use of expanders in these settings incurs a number of drawbacks. For example, the best currently known algorithm for decremental APSP in constant-degree expanders can only achieve a (log n) O(1/ 2 ) -approximation with n 1+O( ) total update time for any . All currently known algorithms for the Cut Player in the Cut-Matching game are either randomized, or provide rather weak guarantees: expansion 1/(log n) 1/ with running time n 1+O( ) . This, in turn, leads to somewhat weak algorithmic guarantees for several central cut problems: the best current almost linear time deterministic algorithms for Sparsest Cut, Lowest Conductance Cut, and Balanced Cut can only achieve approximation factor (log n) ω(1). Lastly, when relying on expanders in distancebased problems, such as dynamic APSP, via current methods, it seems inevitable that one has to settle for approximation factors that are at least Ω(log n). In contrast, we do not have any negative results that rule out a factor-5 approximation with near-linear total update time. In this paper we propose the use of well-connected graphs, and introduce a new algorithmic toolkit for such graphs that, in a sense, mirrors the above mentioned algorithmic tools for expanders. One of these new tools is the Distanced Matching game, an analogue of the Cut-Matching game for well-connected graphs. We demonstrate the power of these new tools by obtaining better results for several of the problems mentioned above. First, we design an algorithm for decremental APSP in expanders with significantly better guarantees: in a constant-degree expander, the algorithm achieves (log n) 1+o(1)-approximation, with total update time n 1+o(1). We also obtain a deterministic algorithm for the Cut Player in the Cut-Matching game that achieves expansion 1 (log n) 5+o(1) in time n 1+o(1), deterministic almost linear-time algorithms for Sparsest Cut, Lowest-Conductance Cut, and Minimum Balanced Cut with approximation factors O(poly log n), as well as improved deterministic algorithm for Expander Decomposition. We believe that the use of well-connected graphs instead of expanders in various dynamic distance-based problems (such as APSP in general graphs) has the potential of providing much stronger guarantees, since we are no longer necessarily restricted to superlogarithmic approximation factors. 
    more » « less
  3. We study the fully dynamic All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. Given an n-vertex graph G with non-negative edge lengths, that undergoes an online sequence of edge insertions and deletions, the goal is to support approximate distance queries and shortest-path queries. We provide a deterministic algorithm for this problem, that, for a given precision parameter є, achieves approximation factor (loglogn)2O(1/є3), and has amortized update time O(nєlogL) per operation, where L is the ratio of longest to shortest edge length. Query time for distance-query is O(2O(1/є)· logn· loglogL), and query time for shortest-path query is O(|E(P)|+2O(1/є)· logn· loglogL), where P is the path that the algorithm returns. To the best of our knowledge, even allowing any o(n)-approximation factor, no adaptive-update algorithms with better than Θ(m) amortized update time and better than Θ(n) query time were known prior to this work. We also note that our guarantees are stronger than the best current guarantees for APSP in decremental graphs in the adaptive-adversary setting. In order to obtain these results, we consider an intermediate problem, called Recursive Dynamic Neighborhood Cover (RecDynNC), that was formally introduced in [Chuzhoy, STOC ’21]. At a high level, given an undirected edge-weighted graph G undergoing an online sequence of edge deletions, together with a distance parameter D, the goal is to maintain a sparse D-neighborhood cover of G, with some additional technical requirements. Our main technical contribution is twofolds. First, we provide a black-box reduction from APSP in fully dynamic graphs to the RecDynNC problem. Second, we provide a new deterministic algorithm for the RecDynNC problem, that, for a given precision parameter є, achieves approximation factor (loglogm)2O(1/є2), with total update time O(m1+є), where m is the total number of edges ever present in G. This improves the previous algorithm of [Chuzhoy, STOC ’21], that achieved approximation factor (logm)2O(1/є) with similar total update time. Combining these two results immediately leads to the deterministic algorithm for fully-dynamic APSP with the guarantees stated above. 
    more » « less
  4. For unweighted graphs, finding isometric embeddings of a graph G is closely related to decompositions of G into Cartesian products of smaller graphs. When G is isomorphic to a Cartesian graph product, we call the factors of this product a factorization of G. When G is isomorphic to an isometric subgraph of a Cartesian graph product, we call those factors a pseudofactorization of G. Prior work has shown that an unweighted graph’s pseudofactorization can be used to generate a canonical isometric embedding into a product of the smallest possible pseudofactors. However, for arbitrary weighted graphs, which represent a richer variety of metric spaces, methods for finding isometric embeddings or determining their existence remain elusive, and indeed pseudofactorization and factorization have not previously been extended to this context. In this work, we address the problem of finding the factorization and pseudofactorization of a weighted graph G, where G satisfies the property that every edge constitutes a shortest path between its endpoints. We term such graphs minimal graphs, noting that every graph can be made minimal by removing edges not affecting its path metric. We generalize pseudofactorization and factorization to minimal graphs and develop new proof techniques that extend the previously proposed algorithms due to Graham and Winkler [Graham and Winkler, ’85] and Feder [Feder, ’92] for pseudofactorization and factorization of unweighted graphs. We show that any n-vertex, m-edge graph with positive integer edge weights can be factored in O(m2) time, plus the time to find all pairs shortest paths (APSP) distances in a weighted graph, resulting in an overall running time of O(m2+n2 log log n) time. We also show that a pseudofactorization for such a graph can be computed in O(mn) time, plus the time to solve APSP, resulting in an O(mn + n2 log log n) running time. 
    more » « less
  5. We present a new technique for efficiently removing almost all short cycles in a graph without unintentionally removing its triangles. Consequently, triangle finding problems do not become easy even in almost k-cycle free graphs, for any constant k≥ 4. Triangle finding is at the base of many conditional lower bounds in P, mainly for distance computation problems, and the existence of many 4- or 5-cycles in a worst-case instance had been the obstacle towards resolving major open questions. Hardness of approximation: Are there distance oracles with m1+o(1) preprocessing time and mo(1) query time that achieve a constant approximation? Existing algorithms with such desirable time bounds only achieve super-constant approximation factors, while only 3− factors were conditionally ruled out (Pătraşcu, Roditty, and Thorup; FOCS 2012). We prove that no O(1) approximations are possible, assuming the 3-SUM or APSP conjectures. In particular, we prove that k-approximations require Ω(m1+1/ck) time, which is tight up to the constant c. The lower bound holds even for the offline version where we are given the queries in advance, and extends to other problems such as dynamic shortest paths. The 4-Cycle problem: An infamous open question in fine-grained complexity is to establish any surprising consequences from a subquadratic or even linear-time algorithm for detecting a 4-cycle in a graph. This is arguably one of the simplest problems without a near-linear time algorithm nor a conditional lower bound. We prove that Ω(m1.1194) time is needed for k-cycle detection for all k≥ 4, unless we can detect a triangle in √n-degree graphs in O(n2−δ) time; a breakthrough that is not known to follow even from optimal matrix multiplication algorithms. 
    more » « less