Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)-approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting k-cliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space. In addition to our theoretical results, we simulate our triangle counting algorithms in real-world graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.
more »
« less
Counting Simplices in Hypergraph Streams
We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a k-uniform hypergraph H with n vertices and m hyperedges, each hyperedge being a k-sized subset of vertices. A k-simplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of k-simplices in H. We design a suite of algorithms for this problem. As with triangle-counting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{-2} log δ^{-1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1-δ. We also give a simpler 1-pass algorithm that achieves O(ε^{-2} log δ^{-1} log n⋅ (m/T) (Δ_E + Δ_V^{1-1/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of k-simplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{-2}), Ω(m^{1+1/k}/T), Ω(m/T^{1-1/k}) and Ω(mΔ_V^{1/k}/T) for multi-pass algorithms and Ω(mΔ_E/T) for 1-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.
more »
« less
- Award ID(s):
- 1907738
- PAR ID:
- 10379938
- Date Published:
- Journal Name:
- Leibniz international proceedings in informatics
- Volume:
- 244
- ISSN:
- 1868-8969
- Page Range / eLocation ID:
- 32:1 - 32:19
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recently, a number of variants of the notion of cut-preserving hypergraph sparsification have been studied in the literature. These variants include directed hypergraph sparsification, submodular hypergraph sparsification, general notions of approximation including spectral approximations, and more general notions like sketching that can answer cut queries using more general data structures than just sparsifiers. In this work, we provide reductions between these different variants of hypergraph sparsification and establish new upper and lower bounds on the space complexity of preserving their cuts. Specifically, we show that: 1) (1 ± ε) directed hypergraph spectral (respectively cut) sparsification on n vertices efficiently reduces to (1 ± ε) undirected hypergraph spectral (respectively cut) sparsification on n² + 1 vertices. Using the work of Lee and Jambulapati, Liu, and Sidford (STOC 2023) this gives us directed hypergraph spectral sparsifiers with O(n² log²(n) / ε²) hyperedges and directed hypergraph cut sparsifiers with O(n² log(n)/ ε²) hyperedges by using the work of Chen, Khanna, and Nagda (FOCS 2020), both of which improve upon the work of Oko, Sakaue, and Tanigawa (ICALP 2023). 2) Any cut sketching scheme which preserves all cuts in any directed hypergraph on n vertices to a (1 ± ε) factor (for ε = 1/(2^{O(√{log(n)})})) must have worst-case bit complexity n^{3 - o(1)}. Because directed hypergraphs are a subclass of submodular hypergraphs, this also shows a worst-case sketching lower bound of n^{3 - o(1)} bits for sketching cuts in general submodular hypergraphs. 3) (1 ± ε) monotone submodular hypergraph cut sparsification on n vertices efficiently reduces to (1 ± ε) symmetric submodular hypergraph sparsification on n+1 vertices. Using the work of Jambulapati et. al. (FOCS 2023) this gives us monotone submodular hypergraph sparsifiers with Õ(n / ε²) hyperedges, improving on the O(n³ / ε²) hyperedge bound of Kenneth and Krauthgamer (arxiv 2023). At a high level, our results use the same general principle, namely, by showing that cuts in one class of hypergraphs can be simulated by cuts in a simpler class of hypergraphs, we can leverage sparsification results for the simpler class of hypergraphs.more » « less
-
Recently, a number of variants of the notion of cut-preserving hypergraph sparsification have been studied in the literature. These variants include directed hypergraph sparsification, submodular hypergraph sparsification, general notions of approximation including spectral approximations, and more general notions like sketching that can answer cut queries using more general data structures than just sparsifiers. In this work, we provide reductions between these different variants of hypergraph sparsification and establish new upper and lower bounds on the space complexity of preserving their cuts. Specifically, we show that: 1) (1 ± ε) directed hypergraph spectral (respectively cut) sparsification on n vertices efficiently reduces to (1 ± ε) undirected hypergraph spectral (respectively cut) sparsification on n² + 1 vertices. Using the work of Lee and Jambulapati, Liu, and Sidford (STOC 2023) this gives us directed hypergraph spectral sparsifiers with O(n² log²(n) / ε²) hyperedges and directed hypergraph cut sparsifiers with O(n² log(n)/ ε²) hyperedges by using the work of Chen, Khanna, and Nagda (FOCS 2020), both of which improve upon the work of Oko, Sakaue, and Tanigawa (ICALP 2023). 2) Any cut sketching scheme which preserves all cuts in any directed hypergraph on n vertices to a (1 ± ε) factor (for ε = 1/(2^{O(√{log(n)})})) must have worst-case bit complexity n^{3 - o(1)}. Because directed hypergraphs are a subclass of submodular hypergraphs, this also shows a worst-case sketching lower bound of n^{3 - o(1)} bits for sketching cuts in general submodular hypergraphs. 3) (1 ± ε) monotone submodular hypergraph cut sparsification on n vertices efficiently reduces to (1 ± ε) symmetric submodular hypergraph sparsification on n+1 vertices. Using the work of Jambulapati et. al. (FOCS 2023) this gives us monotone submodular hypergraph sparsifiers with Õ(n / ε²) hyperedges, improving on the O(n³ / ε²) hyperedge bound of Kenneth and Krauthgamer (arxiv 2023). At a high level, our results use the same general principle, namely, by showing that cuts in one class of hypergraphs can be simulated by cuts in a simpler class of hypergraphs, we can leverage sparsification results for the simpler class of hypergraphs.more » « less
-
Bringmann, Karl; Grohe, Martin; Puppis, Gabriele; Svensson, Ola (Ed.)Recently, a number of variants of the notion of cut-preserving hypergraph sparsification have been studied in the literature. These variants include directed hypergraph sparsification, submodular hypergraph sparsification, general notions of approximation including spectral approximations, and more general notions like sketching that can answer cut queries using more general data structures than just sparsifiers. In this work, we provide reductions between these different variants of hypergraph sparsification and establish new upper and lower bounds on the space complexity of preserving their cuts. Specifically, we show that: 1) (1 ± ε) directed hypergraph spectral (respectively cut) sparsification on n vertices efficiently reduces to (1 ± ε) undirected hypergraph spectral (respectively cut) sparsification on n² + 1 vertices. Using the work of Lee and Jambulapati, Liu, and Sidford (STOC 2023) this gives us directed hypergraph spectral sparsifiers with O(n² log²(n) / ε²) hyperedges and directed hypergraph cut sparsifiers with O(n² log(n)/ ε²) hyperedges by using the work of Chen, Khanna, and Nagda (FOCS 2020), both of which improve upon the work of Oko, Sakaue, and Tanigawa (ICALP 2023). 2) Any cut sketching scheme which preserves all cuts in any directed hypergraph on n vertices to a (1 ± ε) factor (for ε = 1/(2^{O(√{log(n)})})) must have worst-case bit complexity n^{3 - o(1)}. Because directed hypergraphs are a subclass of submodular hypergraphs, this also shows a worst-case sketching lower bound of n^{3 - o(1)} bits for sketching cuts in general submodular hypergraphs. 3) (1 ± ε) monotone submodular hypergraph cut sparsification on n vertices efficiently reduces to (1 ± ε) symmetric submodular hypergraph sparsification on n+1 vertices. Using the work of Jambulapati et. al. (FOCS 2023) this gives us monotone submodular hypergraph sparsifiers with Õ(n / ε²) hyperedges, improving on the O(n³ / ε²) hyperedge bound of Kenneth and Krauthgamer (arxiv 2023). At a high level, our results use the same general principle, namely, by showing that cuts in one class of hypergraphs can be simulated by cuts in a simpler class of hypergraphs, we can leverage sparsification results for the simpler class of hypergraphs.more » « less
-
Barman, Siddharth; Lasota, Sławomir (Ed.)We consider the problem of estimating the size of a maximum matching in low-arboricity graphs in the dynamic graph stream model. In this setting, an algorithm with limited memory makes multiple passes over a stream of edge insertions and deletions, resulting in a low-arboricity graph. Let n be the number of vertices of the input graph, and α be its arboricity. We give the following results. 1) As our main result, we give a three-pass streaming algorithm that produces an (α + 2)(1 + ε)-approximation and uses space O(ε^{-2}⋅α²⋅n^{1/2}⋅log n). This result should be contrasted with the Ω(α^{-5/2}⋅n^{1/2}) space lower bound established by [Assadi et al., SODA'17] for one-pass algorithms, showing that, for graphs of constant arboricity, the one-pass space lower bound can be achieved in three passes (up to poly-logarithmic factors). Furthermore, we obtain a two-pass algorithm that uses space O(ε^{-2}⋅α²⋅n^{3/5}⋅log n). 2) We also give a (1+ε)-approximation multi-pass algorithm, where the space used is parameterized by an upper bound on the size of a largest matching. For example, using O(log log n) passes, the space required is O(ε^{-1}⋅α²⋅k⋅log n), where k denotes an upper bound on the size of a largest matching. Finally, we define a notion of arboricity in the context of matrices. This is a natural measure of the sparsity of a matrix that is more nuanced than simply bounding the total number of nonzero entries, but less restrictive than bounding the number of nonzero entries in each row and column. For such matrices, we exploit our results on estimating matching size to present upper bounds for the problem of rank estimation in the dynamic data stream model.more » « less