Counting Simplices in Hypergraph Streams
We consider the problem of space-efficiently estimating the number of simplices in a hypergraph stream. This is the most natural hypergraph generalization of the highly-studied problem of estimating the number of triangles in a graph stream. Our input is a k-uniform hypergraph H with n vertices and m hyperedges, each hyperedge being a k-sized subset of vertices. A k-simplex in H is a subhypergraph on k+1 vertices X such that all k+1 possible hyperedges among X exist in H. The goal is to process the hyperedges of H, which arrive in an arbitrary order as a data stream, and compute a good estimate of T_k(H), the number of k-simplices in H. We design a suite of algorithms for this problem. As with triangle-counting in graphs (which is the special case k = 2), sublinear space is achievable but only under a promise of the form T_k(H) ≥ T. Under such a promise, our algorithms use at most four passes and together imply a space bound of O(ε^{-2} log δ^{-1} polylog n ⋅ min{(m^{1+1/k})/T, m/(T^{2/(k+1)})}) for each fixed k ≥ 3, in order to guarantee an estimate within (1±ε)T_k(H) with probability ≥ 1-δ. We also give a simpler 1-pass algorithm that achieves O(ε^{-2} log δ^{-1} log n⋅ (m/T) (Δ_E + Δ_V^{1-1/k})) space, where Δ_E (respectively, Δ_V) denotes the maximum number of k-simplices that share a hyperedge (respectively, a vertex), which generalizes a previous result for the k = 2 case. We complement these algorithmic results with space lower bounds of the form Ω(ε^{-2}), Ω(m^{1+1/k}/T), Ω(m/T^{1-1/k}) and Ω(mΔ_V^{1/k}/T) for multi-pass algorithms and Ω(mΔ_E/T) for 1-pass algorithms, which show that some of the dependencies on parameters in our upper bounds are nearly tight. Our techniques extend and generalize several different ideas previously developed for triangle counting in graphs, using appropriate innovations to handle the more complicated combinatorics of hypergraphs.  more » « less
Award ID(s):
NSF-PAR ID:
10379938
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Leibniz international proceedings in informatics
Volume:
244
ISSN:
1868-8969
Page Range / eLocation ID:
32:1 - 32:19
Format(s):
Medium: X
4. We revisit the much-studied problem of space-efficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixed-sized cliques and cycles, obtaining a number of new upper and lower bounds. For the important special case of counting triangles, we give a $4$-pass, $(1\pm\varepsilon)$-approximate, randomized algorithm that needs at most $\widetilde{O}(\varepsilon^{-2}\cdot m^{3/2}/T)$ space, where $m$ is the number of edges and $T$ is a promised lower bound on the number of triangles. This matches the space bound of a very recent algorithm (McGregor et al., PODS 2016), with an arguably simpler and more general technique. We give an improved multi-pass lower bound of $\Omega(\min\{m^{3/2}/T, m/\sqrt{T}\})$, applicable at essentially all densities $\Omega(n) \le m \le O(n^2)$. We also prove other multi-pass lower bounds in terms of various structural parameters of the input graph. Together, our results resolve a couple of open questions raised in recent work (Braverman et al., ICALP 2013). Our presentation emphasizes more general frameworks, for both upper and lower bounds. We give a sampling algorithm for counting arbitrary subgraphs and then improve it via combinatorial means in the special cases of counting odd cliques and odd cycles. Our results show that these problems are considerably easier in the cash-register streaming model than in the turnstile model, where previous work had focused (Manjunath et al., ESA 2011; Kane et al., ICALP 2012). We use Tur{\'a}n graphs and related gadgets to derive lower bounds for counting cliques and cycles, with triangle-counting lower bounds following as a corollary.