- Award ID(s):
- 1651570
- PAR ID:
- 10213414
- Date Published:
- Journal Name:
- SPAA 2020
- Page Range / eLocation ID:
- 329 to 338
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds.more » « less
-
null (Ed.)Finding locally optimal solutions for MAX-CUT and MAX- k -CUT are well-known PLS-complete problems. An instinctive approach to finding such a locally optimum solution is the FLIP method. Even though FLIP requires exponential time in worst-case instances, it tends to terminate quickly in practical instances. To explain this discrepancy, the run-time of FLIP has been studied in the smoothed complexity framework. Etscheid and Röglin (ACM Transactions on Algorithms, 2017) showed that the smoothed complexity of FLIP for max-cut in arbitrary graphs is quasi-polynomial. Angel, Bubeck, Peres, and Wei (STOC, 2017) showed that the smoothed complexity of FLIP for max-cut in complete graphs is ( O Φ 5 n 15.1 ), where Φ is an upper bound on the random edge-weight density and Φ is the number of vertices in the input graph. While Angel, Bubeck, Peres, and Wei’s result showed the first polynomial smoothed complexity, they also conjectured that their run-time bound is far from optimal. In this work, we make substantial progress toward improving the run-time bound. We prove that the smoothed complexity of FLIP for max-cut in complete graphs is O (Φ n 7.83 ). Our results are based on a carefully chosen matrix whose rank captures the run-time of the method along with improved rank bounds for this matrix and an improved union bound based on this matrix. In addition, our techniques provide a general framework for analyzing FLIP in the smoothed framework. We illustrate this general framework by showing that the smoothed complexity of FLIP for MAX-3-CUT in complete graphs is polynomial and for MAX - k - CUT in arbitrary graphs is quasi-polynomial. We believe that our techniques should also be of interest toward showing smoothed polynomial complexity of FLIP for MAX - k - CUT in complete graphs for larger constants k .more » « less
-
We consider in this paper the similarity search problem that retrieves relevant graphs from a graph database under the well-known graph edit distance (GED) constraint. Formally, given a graph database G = {g1, g2, . . . , gn} and a query graph q, we aim to search the graph gi ∈ g such that the graph edit distance between gi and q, GED(gi, q), is within a user-specified GED threshold, τ. In spite of its theoretical significance and wide applicability, the GED-based similarity search problem is challenging in large graph databases due in particular to a large amount of GED computation incurred, which has proven to be NP-hard. In this paper, we propose a parameterized, partition-based GED lower bound that can be instantiated into a series of tight lower bounds towards synergistically pruning false-positive graphs from before costly GED computation is performed. We design an efficient, selectivity-aware algorithm to partition graphs of into highly selective subgraphs. They are further incorporated in a cost-effective, multi-layered indexing structure, ML-Index (Multi-Layered Index), for GED lower bound cross-checking and false-positive graph filtering with theoretical performance guarantees. Experimental studies in real and synthetic graph databases validate the efficiency and effectiveness of ML-Index, which achieves up to an order of magnitude speedup over the state-of-the-art method for similarity search in graph databases.more » « less
-
In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors.
The first problem is approximating cuts in balanced directed graphs. In this problem, we want to build a data structure that can provide (1 ± ε)-approximation of cut values on a graph with n vertices. For arbitrary directed graphs, such a data structure requires Ω(n2) bits even for constant ε. To circumvent this, recent works study β-balanced graphs, meaning that for every directed cut, the total weight of edges in one direction is at most β times the total weight in the other direction. We consider the for-each model, where the goal is to approximate each cut with constant probability, and the for-all model, where all cuts must be preserved simultaneously. We improve the previous Ømega(n √β/ε) lower bound in the for-each model to ~Ω (n √β /ε) and we improve the previous Ω(n β/ε) lower bound in the for-all model to Ω(n β/ε2). This resolves the main open questions of (Cen et al., ICALP, 2021).
The second problem is approximating the global minimum cut in a local query model, where we can only access the graph via degree, edge, and adjacency queries. We prove an ΩL(min m, m/ε2k R) lower bound for this problem, which improves the previous ΩL(m/k R) lower bound, where m is the number of edges, k is the minimum cut size, and we seek a (1+ε)-approximation. In addition, we show that existing upper bounds with minor modifications match our lower bound up to logarithmic factors.
-
There has been significant recent interest in \emph{graph-based nearest neighbor search methods}, many of which are centered on the construction of (approximately) {\em navigable}\/ graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to the given distance function. The complete graph is obviously navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.more » « less