skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Simplified and Space-Optimal Semi-Streaming (2+epsilon)-Approximate Matching
In a recent breakthrough, Paz and Schwartzman (SODA'17) presented a single-pass (2+epsilon)-approximation algorithm for the maximum weight matching problem in the semi-streaming model. Their algorithm uses O(n log^2 n) bits of space, for any constant epsilon>0. We present a simplified and more intuitive primal-dual analysis, for essentially the same algorithm, which also improves the space complexity to the optimal bound of O(n log n) bits - this is optimal as the output matching requires Omega(n log n) bits.  more » « less
Award ID(s):
1814603 1750808 1618280 1527110
PAR ID:
10121533
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Symposium on Simplicity in Algorithms
Volume:
69
Page Range / eLocation ID:
13:1 - 13:8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting. 
    more » « less
  2. A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting. 
    more » « less
  3. We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d log(d B/epsilon) +log n) bits per point from which one can approximate the distances up to a factor of 1 + epsilon. Our algorithm almost matches the recent bound of Indyk et al, 2017} while being much simpler. We compare our algorithm to Product Quantization (PQ) (Jegou et al, 2011) a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT, MNIST, New York City taxi time series and a synthetic one-dimensional data set embedded in a high-dimensional space. Our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance. 
    more » « less
  4. We present a weighted approach to compute a maximum cardinality matching in an arbitrary bipartite graph. Our main result is a new algorithm that takes as input a weighted bipartite graph G(A cup B,E) with edge weights of 0 or 1. Let w <= n be an upper bound on the weight of any matching in G. Consider the subgraph induced by all the edges of G with a weight 0. Suppose every connected component in this subgraph has O(r) vertices and O(mr/n) edges. We present an algorithm to compute a maximum cardinality matching in G in O~(m(sqrt{w} + sqrt{r} + wr/n)) time. When all the edge weights are 1 (symmetrically when all weights are 0), our algorithm will be identical to the well-known Hopcroft-Karp (HK) algorithm, which runs in O(m sqrt{n}) time. However, if we can carefully assign weights of 0 and 1 on its edges such that both w and r are sub-linear in n and wr=O(n^{gamma}) for gamma < 3/2, then we can compute maximum cardinality matching in G in o(m sqrt{n}) time. Using our algorithm, we obtain a new O~(n^{4/3}/epsilon^4) time algorithm to compute an epsilon-approximate bottleneck matching of A,B subsetR^2 and an 1/(epsilon^{O(d)}}n^{1+(d-1)/(2d-1)}) poly log n time algorithm for computing epsilon-approximate bottleneck matching in d-dimensions. All previous algorithms take Omega(n^{3/2}) time. Given any graph G(A cup B,E) that has an easily computable balanced vertex separator for every subgraph G'(V',E') of size |V'|^{delta}, for delta in [1/2,1), we can apply our algorithm to compute a maximum matching in O~(mn^{delta/1+delta}) time improving upon the O(m sqrt{n}) time taken by the HK-Algorithm. 
    more » « less
  5. Abstract We present a new elementary algorithm that takes $$ \textrm{time} \ \ O_\epsilon \left( x^{\frac{3}{5}} (\log x)^{\frac{8}{5}+\epsilon } \right) \ \ \textrm{and} \ \textrm{space} \ \ O\left( x^{\frac{3}{10}} (\log x)^{\frac{13}{10}} \right) $$ time O ϵ x 3 5 ( log x ) 8 5 + ϵ and space O x 3 10 ( log x ) 13 10 (measured bitwise) for computing $$M(x) = \sum _{n \le x} \mu (n),$$ M ( x ) = ∑ n ≤ x μ ( n ) , where $$\mu (n)$$ μ ( n ) is the Möbius function. This is the first improvement in the exponent of x for an elementary algorithm since 1985. We also show that it is possible to reduce space consumption to $$O(x^{1/5} (\log x)^{5/3})$$ O ( x 1 / 5 ( log x ) 5 / 3 ) by the use of (Helfgott in: Math Comput 89:333–350, 2020), at the cost of letting time rise to the order of $$x^{3/5} (\log x)^2 \log \log x$$ x 3 / 5 ( log x ) 2 log log x . 
    more » « less