We revisit the much-studied problem of space-efficiently estimating the number of triangles in a graph stream, and extensions of this problem to counting fixed-sized cliques and cycles, obtaining a number of new upper and lower bounds. For the important special case of counting triangles, we give a $$4$$-pass, $$(1\pm\varepsilon)$$-approximate, randomized algorithm that needs at most $$\widetilde{O}(\varepsilon^{-2}\cdot m^{3/2}/T)$$ space, where $$m$$ is the number of edges and $$T$$ is a promised lower bound on the number of triangles. This matches the space bound of a very recent algorithm (McGregor et al., PODS 2016), with an arguably simpler and more general technique. We give an improved multi-pass lower bound of $$\Omega(\min\{m^{3/2}/T, m/\sqrt{T}\})$$, applicable at essentially all densities $$\Omega(n) \le m \le O(n^2)$$. We also prove other multi-pass lower bounds in terms of various structural parameters of the input graph. Together, our results resolve a couple of open questions raised in recent work (Braverman et al., ICALP 2013). Our presentation emphasizes more general frameworks, for both upper and lower bounds. We give a sampling algorithm for counting arbitrary subgraphs and then improve it via combinatorial means in the special cases of counting odd cliques and odd cycles. Our results show that these problems are considerably easier in the cash-register streaming model than in the turnstile model, where previous work had focused (Manjunath et al., ESA 2011; Kane et al., ICALP 2012). We use Tur{\'a}n graphs and related gadgets to derive lower bounds for counting cliques and cycles, with triangle-counting lower bounds following as a corollary. 
                        more » 
                        « less   
                    
                            
                            Triangle and Four Cycle Counting with Predictions in Graph Streams
                        
                    
    
            We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, Hsu et al. (2019a) and Jiang et al. (2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior “classical” algorithms that did not use oracles. In this paper, we explore the power of a “heavy edge” oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on “classical” streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10338743
- Date Published:
- Journal Name:
- Tenth International Conference on Learning Representations (ICLR 2022)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Meka, Raghu (Ed.)We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less well-studied. To this end, we show upper and lower bounds on minimum cut problems in insertion-only streams for a variety of settings, including for both randomized and deterministic algorithms, for both arbitrary and random order streams, and for both approximate and exact algorithms. One of our main results is an Õ(n/ε) space algorithm with fast update time for approximating a spectral cut query with high probability on a stream given in an arbitrary order. Our result breaks the Ω(n/ε²) space lower bound required of a sparsifier that approximates all cuts simultaneously. Using this result, we provide streaming algorithms with near optimal space of Õ(n/ε) for minimum cut and approximate all-pairs effective resistances, with matching space lower-bounds. The amortized update time of our algorithms is Õ(1), provided that the number of edges in the input graph is at least (n/ε²)^{1+o(1)}. We also give a generic way of incorporating sketching into a recursive contraction algorithm to improve the post-processing time of our algorithms. In addition to these results, we give a random-order streaming algorithm that computes the exact minimum cut on a simple, unweighted graph using Õ(n) space. Finally, we give an Ω(n/ε²) space lower bound for deterministic minimum cut algorithms which matches the best-known upper bound up to polylogarithmic factors.more » « less
- 
            Santhanam, Rahul (Ed.)The following question arises naturally in the study of graph streaming algorithms: Is there any graph problem which is "not too hard", in that it can be solved efficiently with total communication (nearly) linear in the number n of vertices, and for which, nonetheless, any streaming algorithm with Õ(n) space (i.e., a semi-streaming algorithm) needs a polynomial n^Ω(1) number of passes? Assadi, Chen, and Khanna [STOC 2019] were the first to prove that this is indeed the case. However, the lower bounds that they obtained are for rather non-standard graph problems. Our first main contribution is to present the first polynomial-pass lower bounds for natural "not too hard" graph problems studied previously in the streaming model: k-cores and degeneracy. We devise a novel communication protocol for both problems with near-linear communication, thus showing that k-cores and degeneracy are natural examples of "not too hard" problems. Indeed, previous work have developed single-pass semi-streaming algorithms for approximating these problems. In contrast, we prove that any semi-streaming algorithm for exactly solving these problems requires (almost) Ω(n^{1/3}) passes. The lower bound follows by a reduction from a generalization of the hidden pointer chasing (HPC) problem of Assadi, Chen, and Khanna, which is also the basis of their earlier semi-streaming lower bounds. Our second main contribution is improved round-communication lower bounds for the underlying communication problems at the basis of these reductions: - We improve the previous lower bound of Assadi, Chen, and Khanna for HPC to achieve optimal bounds for this problem. - We further observe that all current reductions from HPC can also work with a generalized version of this problem that we call MultiHPC, and prove an even stronger and optimal lower bound for this generalization. These two results collectively allow us to improve the resulting pass lower bounds for semi-streaming algorithms by a polynomial factor, namely, from n^{1/5} to n^{1/3} passes.more » « less
- 
            We study graph computations in an enhanced data streaming setting, where a space-bounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify a purported proof sent by the cloud service that the work done in the cloud is correct. A line of work starting with Chakrabarti et al. (ICALP 2009) has provided such algorithms, which we call schemes, for several statistical and graph-theoretic problems, many of which exhibit a tradeoff between the length of the proof and the space used by the streaming verifier. This work designs new schemes for a number of basic graph problems - including triangle counting, maximum matching, topological sorting, and single-source shortest paths - where past work had either failed to obtain smooth tradeoffs between these two key complexity measures or only obtained suboptimal tradeoffs. Our key innovation is having the verifier compute certain nonlinear sketches of the input stream, leading to either new or improved tradeoffs. In many cases, our schemes, in fact, provide optimal tradeoffs up to logarithmic factors. Specifically, for most graph problems that we study, it is known that the product of the verifier’s space cost v and the proof length h must be at least Omega(n^2) for n-vertex graphs. However, matching upper bounds are only known for a handful of settings of h and v on the curve h*v = ~Theta(n^2). For example, for counting triangles and maximum matching, schemes with costs lying on this curve are only known for (h = ~O(n²), v = ~O(1)), (h = ~O(n), v = ~O(n)), and the trivial (h = ~O(1), v = ~O(n²)). A major message of this work is that by exploiting nonlinear sketches, a significant "portion" of costs on the tradeoff curve h*v=n^2 can be achieved.more » « less
- 
            The random order graph streaming model has received significant attention recently, with problems such as matching size estimation, component counting, and the evaluation of bounded degree constant query testable properties shown to admit surprisingly space efficient algorithms. The main result of this paper is a space efficient single pass random order streaming algorithm for simulating nearly independent random walks that start at uniformly random vertices. We show that the distribution of k-step walks from b vertices chosen uniformly at random can be approximated up to error ∊ per walk using  words of space with a single pass over a randomly ordered stream of edges, solving an open problem of Peng and Sohler [SODA '18]. Applications of our result include the estimation of the average return probability of the k-step walk (the trace of the kth power of the random walk matrix) as well as the estimation of PageRank. We complement our algorithm with a strong impossibility result for directed graphs.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    