A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting.
more »
« less
Large cliques and independent sets all over the place
We study the following question raised by Erdős and Hajnal in the early 90’s. Over all n n -vertex graphs G G what is the smallest possible value of m m for which any m m vertices of G G contain both a clique and an independent set of size log n \log n ? We construct examples showing that m m is at most 2 2 ( log log n ) 1 / 2 + o ( 1 ) 2^{2^{(\log \log n)^{1/2+o(1)}}} obtaining a twofold sub-polynomial improvement over the upper bound of about n \sqrt {n} coming from the natural guess, the random graph. Our (probabilistic) construction gives rise to new examples of Ramsey graphs, which while having no very large homogenous subsets contain both cliques and independent sets of size log n \log n in any small subset of vertices. This is very far from being true in random graphs. Our proofs are based on an interplay between taking lexicographic products and using randomness.
more »
« less
- Award ID(s):
- 1855464
- PAR ID:
- 10338583
- Date Published:
- Journal Name:
- Proceedings of the American Mathematical Society
- Volume:
- 149
- Issue:
- 746
- ISSN:
- 0002-9939
- Page Range / eLocation ID:
- 3145 to 3157
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting.more » « less
-
Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph G = (V, E) with n vertices, m edges and T triangles, our first result is an algorithm that outputs a (1+ε)-approximation to T, with asymptotically optimal round and total space complexity provided any S ≥ max{(√ m, n²/m)} space per machine and assuming T = Ω(√{m/n}). Our result gives a quadratic improvement on the bound on T over previous works. We also provide a simple extension of our result to counting any subgraph of k size for constant k ≥ 1. Our second result is an O_δ(log log n)-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the arboricity α of the input graph. We extend this result to exactly counting k-cliques for any constant k. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most 5 can be implemented in the MPC model in Õ_δ(√{log n}) rounds, O(n^δ) space per machine and O(mα³) total space. In addition to our theoretical results, we simulate our triangle counting algorithms in real-world graphs obtained from the Stanford Network Analysis Project (SNAP) database. Our results show that both our approximate and exact counting algorithms exhibit improvements in terms of round complexity and approximation ratio, respectively, compared to two previous widely used algorithms for these problems.more » « less
-
null (Ed.)We consider the classical Minimum Balanced Cut problem: given a graph $$G$$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almost-linear time} approximation algorithm for this problem. Specifically, our algorithm, given an $$n$$-vertex $$m$$-edge graph $$G$$ and any parameter $$1\leq r\leq O(\log n)$$, computes a $$(\log m)^{r^2}$$-approximation for Minimum Balanced Cut on $$G$$, in time $$O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$$. In particular, we obtain a $$(\log m)^{1/\epsilon}$$-approximation in time $$m^{1+O(1/\sqrt{\epsilon})}$$ for any constant $$\epsilon$$, and a $$(\log m)^{f(m)}$$-approximation in time $$m^{1+o(1)}$$, for any slowly growing function $$m$$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the Lowest-Conductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $$G$$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worst-case update time on an $$n$$-vertex graph is $$n^{o(1)}$$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $$n$$ factors, those of known randomized algorithms. The implications include almost-linear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.more » « less
-
Given any graph G G , the spread of G G is the maximum difference between any two eigenvalues of the adjacency matrix of G G . In this paper, we resolve a pair of 20-year-old conjectures of Gregory, Hershkowitz, and Kirkland regarding the spread of graphs. The first states that for all positive integers n n , the n n -vertex graph G G that maximizes spread is the join of a clique and an independent set, with ⌊ 2 n / 3 ⌋ \lfloor 2n/3 \rfloor and ⌈ n / 3 ⌉ \lceil n/3 \rceil vertices, respectively. Using techniques from the theory of graph limits and numerical analysis, we prove this claim for all n n sufficiently large. As an intermediate step, we prove an analogous result for a family of operators in the Hilbert space over L 2 [ 0 , 1 ] \mathscr {L}^2[0,1] . The second conjecture claims that for any fixed m ≤ n 2 / 4 m \leq n^2/4 , if G G maximizes spread over all n n -vertex graphs with m m edges, then G G is bipartite. We prove an asymptotic version of this conjecture. Furthermore, we construct an infinite family of counterexamples, which shows that our asymptotic solution is tight up to lower-order error terms.more » « less
An official website of the United States government

