Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate.
Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)-approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting k-cliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space.
In addition to our theoretical results, we simulate our triangle counting algorithms in real-world graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.
more »
« less
Large cliques and independent sets all over the place
We study the following question raised by Erdős and Hajnal in the early 90’s. Over all n n -vertex graphs G G what is the smallest possible value of m m for which any m m vertices of G G contain both a clique and an independent set of size log n \log n ? We construct examples showing that m m is at most 2 2 ( log log n ) 1 / 2 + o ( 1 ) 2^{2^{(\log \log n)^{1/2+o(1)}}} obtaining a twofold sub-polynomial improvement over the upper bound of about n \sqrt {n} coming from the natural guess, the random graph. Our (probabilistic) construction gives rise to new examples of Ramsey graphs, which while having no very large homogenous subsets contain both cliques and independent sets of size log n \log n in any small subset of vertices. This is very far from being true in random graphs. Our proofs are based on an interplay between taking lexicographic products and using randomness.
more »
« less
- Award ID(s):
- 1855464
- PAR ID:
- 10338583
- Date Published:
- Journal Name:
- Proceedings of the American Mathematical Society
- Volume:
- 149
- Issue:
- 746
- ISSN:
- 0002-9939
- Page Range / eLocation ID:
- 3145 to 3157
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$-vertex $m$-edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$-approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/\epsilon}$-approximation in time $m^{1+O(1/\sqrt{\epsilon})}$ for any constant $\epsilon$, and a $(\log m)^{f(m)}$-approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $n$-vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.more » « less
-
Given any graph G G , the spread of G G is the maximum difference between any two eigenvalues of the adjacency matrix of G G . In this paper, we resolve a pair of 20-year-old conjectures of Gregory, Hershkowitz, and Kirkland regarding the spread of graphs. The first states that for all positive integers n n , the n n -vertex graph G G that maximizes spread is the join of a clique and an independent set, with ⌊ 2 n / 3 ⌋ \lfloor 2n/3 \rfloor and ⌈ n / 3 ⌉ \lceil n/3 \rceil vertices, respectively. Using techniques from the theory of graph limits and numerical analysis, we prove this claim for all n n sufficiently large. As an intermediate step, we prove an analogous result for a family of operators in the Hilbert space over L 2 [ 0 , 1 ] \mathscr {L}^2[0,1] . The second conjecture claims that for any fixed m ≤ n 2 / 4 m \leq n^2/4 , if G G maximizes spread over all n n -vertex graphs with m m edges, then G G is bipartite. We prove an asymptotic version of this conjecture. Furthermore, we construct an infinite family of counterexamples, which shows that our asymptotic solution is tight up to lower-order error terms.more » « less
-
Abstract We suggest two related conjectures dealing with the existence of spanning irregular subgraphs of graphs. The first asserts that any $d$ -regular graph on $n$ vertices contains a spanning subgraph in which the number of vertices of each degree between $0$ and $d$ deviates from $\frac{n}{d+1}$ by at most $2$ . The second is that every graph on $n$ vertices with minimum degree $\delta$ contains a spanning subgraph in which the number of vertices of each degree does not exceed $\frac{n}{\delta +1}+2$ . Both conjectures remain open, but we prove several asymptotic relaxations for graphs with a large number of vertices $n$ . In particular we show that if $d^3 \log n \leq o(n)$ then every $d$ -regular graph with $n$ vertices contains a spanning subgraph in which the number of vertices of each degree between $0$ and $d$ is $(1+o(1))\frac{n}{d+1}$ . We also prove that any graph with $n$ vertices and minimum degree $\delta$ contains a spanning subgraph in which no degree is repeated more than $(1+o(1))\frac{n}{\delta +1}+2$ times.more » « less
-
Bae, Sang Won ; Park, Heejin (Ed.)We present an O(n³ g² log g + m) + Õ(n^{ω + 1}) time deterministic algorithm to find the minimum cycle basis of a directed graph embedded on an orientable surface of genus g. This result improves upon the previous fastest known running time of O(m³ n + m² n² log n) applicable to general directed graphs. While an O(n^ω + 2^{2g} n² + m) time deterministic algorithm was known for undirected graphs, the use of the underlying field ℚ in the directed case (as opposed to ℤ₂ for the undirected case) presents extra challenges. It turns out that some of our new observations are useful for both variants of the problem, so we present an O(n^ω + n² g² log g + m) time deterministic algorithm for undirected graphs as well.more » « less