Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing largescale graphs. By now, we have quite fast algorithmsusually sublogarithmictime and often poly(łogłog n)time, or even fasterfor a number of fundamental graph problems in the massively parallel computation (MPC) model. This model is a widelyadopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an alltoall manner to process largescale data. Contributing to this line of work on MPC graph algorithms, we present poly(łog k) ε poly(łogłog n) round MPC algorithms for computing O(k^1+o(1) )spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmictime MPC algorithms for spanner construction.
As primary applications of our spanners, we get two important implications, as follows: For the MPC setting, we get an O(łog^2łog n)round algorithm for O(łog^1+o(1) n) approximation of all pairs shortest paths (APSP) in the nearlinear regime of local memory. To the best of our knowledge, this is the first sublogarithmictime MPC algorithm for distance approximations. Our result above also extends to the Congested Clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first sublogarithmic algorithm for approximating APSP in weighted graphs in the Congested Clique model.
more »
« less
Solving AllPairs ShortestPaths Problem in Large Graphs Using Apache Spark
Algorithms for computing AllPairs ShortestPaths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as FloydWarshall and Johnson, quickly become infeasible for large input graphs, necessitating parallel approaches. In this work, we propose, implement and thoroughly analyse different strategies for APSP on distributed memory clusters with Apache Spark. Our solvers are designed for large undirected weighted graphs, and differ in complexity and degree of reliance on techniques outside of pure Spark API. We demonstrate that the best performing solver is able to handle APSP problems with over 200,000 vertices on a 1024core cluster. However, it requires auxiliary shared persistent storage to compensate for missing Spark functionality.
more »
« less
 Award ID(s):
 1910539
 NSFPAR ID:
 10145329
 Date Published:
 Journal Name:
 Proceedings of the 48th International Conference on Parallel Processing
 Page Range / eLocation ID:
 1 to 10
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Expander graphs play a central role in graph theory and algorithms. With a number of powerful algorithmic tools developed around them, such as the CutMatching game, expander pruning, expander decomposition, and algorithms for decremental AllPairs Shortest Paths (APSP) in expanders, to name just a few, the use of expanders in the design of graph algorithms has become ubiquitous. Specific applications of interest to us are fast deterministic algorithms for cut problems in static graphs, and algorithms for dynamic distancebased graph problems, such as APSP. Unfortunately, the use of expanders in these settings incurs a number of drawbacks. For example, the best currently known algorithm for decremental APSP in constantdegree expanders can only achieve a (log n) O(1/ 2 ) approximation with n 1+O( ) total update time for any . All currently known algorithms for the Cut Player in the CutMatching game are either randomized, or provide rather weak guarantees: expansion 1/(log n) 1/ with running time n 1+O( ) . This, in turn, leads to somewhat weak algorithmic guarantees for several central cut problems: the best current almost linear time deterministic algorithms for Sparsest Cut, Lowest Conductance Cut, and Balanced Cut can only achieve approximation factor (log n) ω(1). Lastly, when relying on expanders in distancebased problems, such as dynamic APSP, via current methods, it seems inevitable that one has to settle for approximation factors that are at least Ω(log n). In contrast, we do not have any negative results that rule out a factor5 approximation with nearlinear total update time. In this paper we propose the use of wellconnected graphs, and introduce a new algorithmic toolkit for such graphs that, in a sense, mirrors the above mentioned algorithmic tools for expanders. One of these new tools is the Distanced Matching game, an analogue of the CutMatching game for wellconnected graphs. We demonstrate the power of these new tools by obtaining better results for several of the problems mentioned above. First, we design an algorithm for decremental APSP in expanders with significantly better guarantees: in a constantdegree expander, the algorithm achieves (log n) 1+o(1)approximation, with total update time n 1+o(1). We also obtain a deterministic algorithm for the Cut Player in the CutMatching game that achieves expansion 1 (log n) 5+o(1) in time n 1+o(1), deterministic almost lineartime algorithms for Sparsest Cut, LowestConductance Cut, and Minimum Balanced Cut with approximation factors O(poly log n), as well as improved deterministic algorithm for Expander Decomposition. We believe that the use of wellconnected graphs instead of expanders in various dynamic distancebased problems (such as APSP in general graphs) has the potential of providing much stronger guarantees, since we are no longer necessarily restricted to superlogarithmic approximation factors.more » « less

We study the fully dynamic AllPairs Shortest Paths (APSP) problem in undirected edgeweighted graphs. Given an nvertex graph G with nonnegative edge lengths, that undergoes an online sequence of edge insertions and deletions, the goal is to support approximate distance queries and shortestpath queries. We provide a deterministic algorithm for this problem, that, for a given precision parameter є, achieves approximation factor (loglogn)2O(1/є3), and has amortized update time O(nєlogL) per operation, where L is the ratio of longest to shortest edge length. Query time for distancequery is O(2O(1/є)· logn· loglogL), and query time for shortestpath query is O(E(P)+2O(1/є)· logn· loglogL), where P is the path that the algorithm returns. To the best of our knowledge, even allowing any o(n)approximation factor, no adaptiveupdate algorithms with better than Θ(m) amortized update time and better than Θ(n) query time were known prior to this work. We also note that our guarantees are stronger than the best current guarantees for APSP in decremental graphs in the adaptiveadversary setting. In order to obtain these results, we consider an intermediate problem, called Recursive Dynamic Neighborhood Cover (RecDynNC), that was formally introduced in [Chuzhoy, STOC ’21]. At a high level, given an undirected edgeweighted graph G undergoing an online sequence of edge deletions, together with a distance parameter D, the goal is to maintain a sparse Dneighborhood cover of G, with some additional technical requirements. Our main technical contribution is twofolds. First, we provide a blackbox reduction from APSP in fully dynamic graphs to the RecDynNC problem. Second, we provide a new deterministic algorithm for the RecDynNC problem, that, for a given precision parameter є, achieves approximation factor (loglogm)2O(1/є2), with total update time O(m1+є), where m is the total number of edges ever present in G. This improves the previous algorithm of [Chuzhoy, STOC ’21], that achieved approximation factor (logm)2O(1/є) with similar total update time. Combining these two results immediately leads to the deterministic algorithm for fullydynamic APSP with the guarantees stated above.more » « less

For unweighted graphs, finding isometric embeddings of a graph G is closely related to decompositions of G into Cartesian products of smaller graphs. When G is isomorphic to a Cartesian graph product, we call the factors of this product a factorization of G. When G is isomorphic to an isometric subgraph of a Cartesian graph product, we call those factors a pseudofactorization of G. Prior work has shown that an unweighted graph’s pseudofactorization can be used to generate a canonical isometric embedding into a product of the smallest possible pseudofactors. However, for arbitrary weighted graphs, which represent a richer variety of metric spaces, methods for finding isometric embeddings or determining their existence remain elusive, and indeed pseudofactorization and factorization have not previously been extended to this context. In this work, we address the problem of finding the factorization and pseudofactorization of a weighted graph G, where G satisfies the property that every edge constitutes a shortest path between its endpoints. We term such graphs minimal graphs, noting that every graph can be made minimal by removing edges not affecting its path metric. We generalize pseudofactorization and factorization to minimal graphs and develop new proof techniques that extend the previously proposed algorithms due to Graham and Winkler [Graham and Winkler, ’85] and Feder [Feder, ’92] for pseudofactorization and factorization of unweighted graphs. We show that any nvertex, medge graph with positive integer edge weights can be factored in O(m2) time, plus the time to find all pairs shortest paths (APSP) distances in a weighted graph, resulting in an overall running time of O(m2+n2 log log n) time. We also show that a pseudofactorization for such a graph can be computed in O(mn) time, plus the time to solve APSP, resulting in an O(mn + n2 log log n) running time.more » « less

We present a new technique for efficiently removing almost all short cycles in a graph without unintentionally removing its triangles. Consequently, triangle finding problems do not become easy even in almost kcycle free graphs, for any constant k≥ 4. Triangle finding is at the base of many conditional lower bounds in P, mainly for distance computation problems, and the existence of many 4 or 5cycles in a worstcase instance had been the obstacle towards resolving major open questions. Hardness of approximation: Are there distance oracles with m1+o(1) preprocessing time and mo(1) query time that achieve a constant approximation? Existing algorithms with such desirable time bounds only achieve superconstant approximation factors, while only 3− factors were conditionally ruled out (Pătraşcu, Roditty, and Thorup; FOCS 2012). We prove that no O(1) approximations are possible, assuming the 3SUM or APSP conjectures. In particular, we prove that kapproximations require Ω(m1+1/ck) time, which is tight up to the constant c. The lower bound holds even for the offline version where we are given the queries in advance, and extends to other problems such as dynamic shortest paths. The 4Cycle problem: An infamous open question in finegrained complexity is to establish any surprising consequences from a subquadratic or even lineartime algorithm for detecting a 4cycle in a graph. This is arguably one of the simplest problems without a nearlinear time algorithm nor a conditional lower bound. We prove that Ω(m1.1194) time is needed for kcycle detection for all k≥ 4, unless we can detect a triangle in √ndegree graphs in O(n2−δ) time; a breakthrough that is not known to follow even from optimal matrix multiplication algorithms.more » « less