A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting. 
                        more » 
                        « less   
                    
                            
                            Practical Data-Dependent Metric Compression with Provable Guarantees
                        
                    
    
            We introduce a new distance-preserving compact representation of multi-dimensional point-sets. Given n points in a d-dimensional space where each coordinate is represented using B bits (i.e., dB bits per point), it produces a representation of size O( d log(d B/epsilon) +log n) bits per point from which one can approximate the distances up to a factor of 1 + epsilon. Our algorithm almost matches the recent bound of Indyk et al, 2017} while being much simpler. We compare our algorithm to Product Quantization (PQ) (Jegou et al, 2011) a state of the art heuristic metric compression method. We evaluate both algorithms on several data sets: SIFT, MNIST, New York City taxi time series and a synthetic one-dimensional data set embedded in a high-dimensional space. Our algorithm produces representations that are comparable to or better than those produced by PQ, while having provable guarantees on its performance. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10065217
- Date Published:
- Journal Name:
- Annual Conference on Neural Information Processing Systems
- Page Range / eLocation ID:
- 2614-2623
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            A $$(1\pm\epsilon)$$ -sparsifier of a hypergraph $G(V, E)$ is a (weighted) subgraph that preserves the value of every cut to within a $$(1\pm\epsilon)$$ -factor. It is known that every hypergraph with $$n$$ vertices admits a $$(1 \pm \epsilon)$$ -sparsifier with $$\tilde{O}(n/{\epsilon}^{2})$$ hyperedges. In this work, we explore the task of building such a sparsifier by using only linear measurements (a linear sketch) over the hyperedges of $$G$$, and provide nearly-matching upper and lower bounds for this task. Specifically, we show that there is a randomized linear sketch of size $$\tilde{O}(nr\log(m)/\epsilon^{2})$$ bits which with high probability contains sufficient information to recover a $$(1\pm\epsilon)$$ cut-sparsifier with $$\tilde{O}(n/\epsilon^{2})$$ hyperedges for any hypergraph with at most $$m$$ edges each of which has arity bounded by $$r$$. This immediately gives a dynamic streaming algorithm for hypergraph cut sparsification with an identical space complexity, improving on the previous best known bound of $$\tilde{O}(nr^{2}\log^{4}({m})/\epsilon^{2})$$ bits of space (Guha, McGregor, and Tench, PODS 2015). We complement our algorithmic result above with a nearly-matching lower bound. We show that for every $$\epsilon\in(0,1)$$, one needs $$\Omega(nr\log(m/n)/\log(n))$$ bits to construct a $$(1\pm\epsilon)$$ -sparsifier via linear sketching, thus showing that our linear sketch achieves an optimal dependence on both $$r$$ and $$\log(m)$$. The starting point for our improved algorithm is importance sampling of hyperedges based on the new notion of $$k$$ -cut strength introduced in the recent work of Quanrud (SODA 2024). The natural algorithm based on this concept leads to $$\log m$$ levels of sampling where errors can potentially accumulate, and this accounts for the polylog $(m)$ losses in the sketch size of the natural algorithm. We develop a more intricate analysis of the accumulation in error to show most levels do not contribute to the error and actual loss is only polylog $(n)$. Combining with careful preprocessing (and analysis) this enables us to get rid of all extraneous $$\log m$$ factors in the sketch size, but the quadratic dependence on $$r$$ remains. This dependence originates from use of correlated $$\ell_{0}$$ -samplers to recover a large number of low-strength edges in a hypergraph simultaneously by looking at neighborhoods of individual vertices. In graphs, this leads to discovery of $$\Omega(n)$$ edges in a single shot, whereas in hypergraphs, this may potentially only reveal $$O$$($$n$$/$$r$$) new edges, thus requiring $$\Omega(r)$$ rounds of recovery. To remedy this we introduce a new technique of random fingerprinting of hyperedges which effectively eliminates the correlations created by large arity hyperedges, and leads to a scheme for recovering hyperedges of low strength with an optimal dependence on $$r$$. Putting all these ingredients together yields our linear sketching algorithm. Our lower bound is established by a reduction from the universal relation problem in the one-way communication setting.more » « less
- 
            We study the densest subgraph problem and give algorithms via multiplicative weights update and area convexity that converge in $$O\left(\frac{\log m}{\epsilon^{2}}\right)$$ and $$O\left(\frac{\log m}{\epsilon}\right)$$ iterations, respectively, both with nearly-linear time per iteration. Compared with the work by Bahmani et al. (2014), our MWU algorithm uses a very different and much simpler procedure for recovering the dense subgraph from the fractional solution and does not employ a binary search. Compared with the work by Boob et al. (2019), our algorithm via area convexity improves the iteration complexity by a factor $$\Delta$$ — the maximum degree in the graph, and matches the fastest theoretical runtime currently known via flows (Chekuri et al., 2022) in total time. Next, we study the dense subgraph decomposition problem and give the first practical iterative algorithm with linear convergence rate $$O\left(mn\log\frac{1}{\epsilon}\right)$$ via accelerated random coordinate descent. This significantly improves over $$O\left(\frac{m\sqrt{mn\Delta}}{\epsilon}\right)$$ time of the FISTA-based algorithm by Harb et al. (2022). In the high precision regime $$\epsilon\ll\frac{1}{n}$$ where we can even recover the exact solution, our algorithm has a total runtime of $$O\left(mn\log n\right)$$, matching the state of the art exact algorithm via parametric flows (Gallo et al., 1989). Empirically, we show that this algorithm is very practical and scales to very large graphs, and its performance is competitive with widely used methods that have significantly weaker theoretical guarantees.more » « less
- 
            In a recent breakthrough, Paz and Schwartzman (SODA'17) presented a single-pass (2+epsilon)-approximation algorithm for the maximum weight matching problem in the semi-streaming model. Their algorithm uses O(n log^2 n) bits of space, for any constant epsilon>0. We present a simplified and more intuitive primal-dual analysis, for essentially the same algorithm, which also improves the space complexity to the optimal bound of O(n log n) bits - this is optimal as the output matching requires Omega(n log n) bits.more » « less
- 
            We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(log m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    