Over the last two decades, frameworks for distributedmemory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the defacto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate.
Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting kcliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space.
In addition to our theoretical results, we simulate our triangle counting algorithms in realworld graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.
more »
« less
Counting Simplices in Hypergraph Streams
We consider the problem of spaceefficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highlystudied problem of estimating the number of triangles in a graph stream. Our input is a kuniform hypergraph H with n vertices and m hyperedges, each hyperedge being a ksized subset of vertices. A ksimplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of ksimplices in H.
We design a suite of algorithms for this problem. As with trianglecounting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{2} log δ^{1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1δ. We also give a simpler 1pass algorithm that achieves O(ε^{2} log δ^{1} log n⋅ (m/T) (Δ_E + Δ_V^{11/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of ksimplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{2}), Ω(m^{1+1/k}/T), Ω(m/T^{11/k}) and Ω(mΔ_V^{1/k}/T) for multipass algorithms and Ω(mΔ_E/T) for 1pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.
more »
« less
 Award ID(s):
 1907738
 NSFPAR ID:
 10379938
 Date Published:
 Journal Name:
 Leibniz international proceedings in informatics
 Volume:
 244
 ISSN:
 18688969
 Page Range / eLocation ID:
 32:1  32:19
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)The problem of sparsifying a graph or a hypergraph while approximately preserving its cut structure has been extensively studied and has many applications. In a seminal work, Benczúr and Karger (1996) showed that given any nvertex undirected weighted graph G and a parameter ε ∈ (0,1), there is a nearlinear time algorithm that outputs a weighted subgraph G' of G of size Õ(n/ε²) such that the weight of every cut in G is preserved to within a (1 ± ε)factor in G'. The graph G' is referred to as a (1 ± ε)approximate cut sparsifier of G. Subsequent recent work has obtained a similar result for the more general problem of hypergraph cut sparsifiers. However, all known sparsification algorithms require Ω(n + m) time where n denotes the number of vertices and m denotes the number of hyperedges in the hypergraph. Since m can be exponentially large in n, a natural question is if it is possible to create a hypergraph cut sparsifier in time polynomial in n, independent of the number of edges. We resolve this question in the affirmative, giving the first sublinear time algorithm for this problem, given appropriate query access to the hypergraph. Specifically, we design an algorithm that constructs a (1 ± ε)approximate cut sparsifier of a hypergraph H(V,E) in polynomial time in n, independent of the number of hyperedges, when given access to the hypergraph using the following two queries: 1) given any cut (S, ̄S), return the size δ_E(S) (cut value queries); and 2) given any cut (S, ̄S), return a uniformly at random edge crossing the cut (cut edge sample queries). Our algorithm outputs a sparsifier with Õ(n/ε²) edges, which is essentially optimal. We then extend our results to show that cut value and cut edge sample queries can also be used to construct hypergraph spectral sparsifiers in poly(n) time, independent of the number of hyperedges. We complement the algorithmic results above by showing that any algorithm that has access to only one of the above two types of queries can not give a hypergraph cut sparsifier in time that is polynomial in n. Finally, we show that our algorithmic results also hold if we replace the cut edge sample queries with a pair neighbor sample query that for any pair of vertices, returns a random edge incident on them. In contrast, we show that having access only to cut value queries and queries that return a random edge incident on a given single vertex, is not sufficient.more » « less

We consider the maximum matching problem in the semistreaming model formalized by Feigenbaum, Kannan, McGregor, Suri, and Zhang that is inspired by giant graphs of today. As our main result, we give a twopass (1/2 + 1/16)approximation algorithm for trianglefree graphs and a twopass (1/2 + 1/32)approximation algorithm for general graphs; these improve the approximation ratios of 1/2 + 1/52 for bipartite graphs and 1/2 + 1/140 for general graphs by Konrad, Magniez, and Mathieu. In three passes, we are able to achieve approximation ratios of 1/2 + 1/10 for trianglefree graphs and 1/2 + 1/19.753 for general graphs. We also give a multipass algorithm where we bound the number of passes precisely—we give a (2/3 − ε) approximation algorithm that uses 2/(3ε) passes for trianglefree graphs and 4/(3ε) passes for general graphs. Our algorithms are simple and combinatorial, use O(n log n) space, and (can be implemented to) have O(1) update time per edge. For general graphs, our multipass algorithm improves the best known deterministic algorithms in terms of the number of passes: * Ahn and Guha give a (2/3−ε)approximation algorithm that uses O(log(1/ε)/ε2) passes, whereas our (2/3 − ε)approximation algorithm uses 4/(3ε) passes; * they also give a (1 − ε)approximation algorithm that uses O(log n · poly(1/ε)) passes, where n is the number of vertices of the input graph; although our algorithm is (2/3−ε)approximation, our number of passes do not depend on n. Earlier multipass algorithms either have a large constant inside bigO notation for the number of passes or the constant cannot be determined due to the involved analysis, so our multipass algorithm should use much fewer passes for approximation ratios bounded slightly below 2/3.more » « less

We revisit the muchstudied problem of spaceefficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixedsized cliques and cycles, obtaining a number of new upper and lower bounds. For the important special case of counting triangles, we give a $4$pass, $(1\pm\varepsilon)$approximate, randomized algorithm that needs at most $\widetilde{O}(\varepsilon^{2}\cdot m^{3/2}/T)$ space, where $m$ is the number of edges and $T$ is a promised lower bound on the number of triangles. This matches the space bound of a very recent algorithm (McGregor et al., PODS 2016), with an arguably simpler and more general technique. We give an improved multipass lower bound of $\Omega(\min\{m^{3/2}/T, m/\sqrt{T}\})$, applicable at essentially all densities $\Omega(n) \le m \le O(n^2)$. We also prove other multipass lower bounds in terms of various structural parameters of the input graph. Together, our results resolve a couple of open questions raised in recent work (Braverman et al., ICALP 2013). Our presentation emphasizes more general frameworks, for both upper and lower bounds. We give a sampling algorithm for counting arbitrary subgraphs and then improve it via combinatorial means in the special cases of counting odd cliques and odd cycles. Our results show that these problems are considerably easier in the cashregister streaming model than in the turnstile model, where previous work had focused (Manjunath et al., ESA 2011; Kane et al., ICALP 2012). We use Tur{\'a}n graphs and related gadgets to derive lower bounds for counting cliques and cycles, with trianglecounting lower bounds following as a corollary.more » « less

Given a metric space ℳ = (X,δ), a weighted graph G over X is a metric tspanner of ℳ if for every u,v ∈ X, δ(u,v) ≤ δ_G(u,v) ≤ t⋅ δ(u,v), where δ_G is the shortest path metric in G. In this paper, we construct spanners for finite sets in metric spaces in the online setting. Here, we are given a sequence of points (s₁, …, s_n), where the points are presented one at a time (i.e., after i steps, we have seen S_i = {s₁, … , s_i}). The algorithm is allowed to add edges to the spanner when a new point arrives, however, it is not allowed to remove any edge from the spanner. The goal is to maintain a tspanner G_i for S_i for all i, while minimizing the number of edges, and their total weight. Under the L₂norm in ℝ^d for arbitrary constant d ∈ ℕ, we present an online (1+ε)spanner algorithm with competitive ratio O_d(ε^{d} log n), improving the previous bound of O_d(ε^{(d+1)}log n). Moreover, the spanner maintained by the algorithm has O_d(ε^{1d}log ε^{1})⋅ n edges, almost matching the (offline) optimal bound of O_d(ε^{1d})⋅ n. In the plane, a tighter analysis of the same algorithm provides an almost quadratic improvement of the competitive ratio to O(ε^{3/2}logε^{1}log n), by comparing the online spanner with an instanceoptimal spanner directly, bypassing the comparison to an MST (i.e., lightness). As a counterpart, we design a sequence of points that yields a Ω_d(ε^{d}) lower bound for the competitive ratio for online (1+ε)spanner algorithms in ℝ^d under the L₁norm. Then we turn our attention to online spanners in general metrics. Note that, it is not possible to obtain a spanner with stretch less than 3 with a subquadratic number of edges, even in the offline setting, for general metrics. We analyze an online version of the celebrated greedy spanner algorithm, dubbed ordered greedy. With stretch factor t = (2k1)(1+ε) for k ≥ 2 and ε ∈ (0,1), we show that it maintains a spanner with O(ε^{1}logε^{1})⋅ n^{1+1/k} edges and O(ε^{1}n^{1/k}log² n) lightness for a sequence of n points in a metric space. We show that these bounds cannot be significantly improved, by introducing an instance that achieves an Ω(1/k⋅ n^{1/k}) competitive ratio on both sparsity and lightness. Furthermore, we establish the tradeoff among stretch, number of edges and lightness for points in ultrametrics, showing that one can maintain a (2+ε)spanner for ultrametrics with O(ε^{1}logε^{1})⋅ n edges and O(ε^{2}) lightness.more » « less