skip to main content

Title: Sparse Recovery for Orthogonal Polynomial Transforms
In this paper we consider the following sparse recovery problem. We have query access to a vector ๐ฑ โˆˆ โ„^N such that xฬ‚ = ๐… ๐ฑ is k-sparse (or nearly k-sparse) for some orthogonal transform ๐…. The goal is to output an approximation (in an ๐“โ‚‚ sense) to xฬ‚ in sublinear time. This problem has been well-studied in the special case that ๐… is the Discrete Fourier Transform (DFT), and a long line of work has resulted in sparse Fast Fourier Transforms that run in time O(k โ‹… polylog N). However, for transforms ๐… other than the DFT (or closely related transforms like the Discrete Cosine Transform), the question is much less settled. In this paper we give sublinear-time algorithms - running in time poly(k log(N)) - for solving the sparse recovery problem for orthogonal transforms ๐… that arise from orthogonal polynomials. More precisely, our algorithm works for any ๐… that is an orthogonal polynomial transform derived from Jacobi polynomials. The Jacobi polynomials are a large class of classical orthogonal polynomials (and include Chebyshev and Legendre polynomials as special cases), and show up extensively in applications like numerical analysis and signal processing. One caveat of our work is that we require an assumption on the sparsity structure of the sparse vector, although we note that vectors with random support have this property with high probability. Our approach is to give a very general reduction from the k-sparse sparse recovery problem to the 1-sparse sparse recovery problem that holds for any flat orthogonal polynomial transform; then we solve this one-sparse recovery problem for transforms derived from Jacobi polynomials. Frequently, sparse FFT algorithms are described as implementing such a reduction; however, the technical details of such works are quite specific to the Fourier transform and moreover the actual implementations of these algorithms do not use the 1-sparse algorithm as a black box. In this work we give a reduction that works for a broad class of orthogonal polynomial families, and which uses any 1-sparse recovery algorithm as a black box.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Leibniz international proceedings in informatics
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We give algorithms with lower arithmetic operation counts for both the Walsh-Hadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of power-of-2 size N. For the WHT, our new algorithm has an operation count of 23/24N logN + O(N). To our knowledge, this gives the first improvement on the N logN operation count of the simple, folklore Fast Walsh-Hadamard Transform algorithm. For the DFT, our new FFT algorithm uses 15/4N logN + O(N) real arithmetic operations. Our leading constant 15/4 = 3.75 improves on the leading constant of 5 from the Cooley-Tukey algorithm from 1965, leading constant 4 from the split-radix algorithm of Yavne from 1968, leading constant 34/9=3.7777 from a modification of the split-radix algorithm by Van Buskirk from 2004, and leading constant 3.76875 from a theoretically optimized version of Van Buskirkโ€™s algorithm by Sergeev from 2017. Our new WHT algorithm takes advantage of a recent line of work on the non-rigidity of the WHT: we decompose the WHT matrix as the sum of a low-rank matrix and a sparse matrix, and then analyze the structures of these matrices to achieve a lower operation count. Our new DFT algorithm comes from a novel reduction, showing that parts of the previous best FFT algorithms can be replaced by calls to an algorithm for the WHT. Replacing the folklore WHT algorithm with our new improved algorithm leads to our improved FFT. 
    more » « less
  2. Hazay, Carmit ; Stam, Martijn (Ed.)
    We study the computational problem of finding a shortest non-zero vector in a rotation of โ„ค๐‘› , which we call โ„ค SVP. It has been a long-standing open problem to determine if a polynomial-time algorithm for โ„ค SVP exists, and there is by now a beautiful line of work showing how to solve it efficiently in certain very special cases. However, despite all of this work, the fastest known algorithm that is proven to solve โ„ค SVP is still simply the fastest known algorithm for solving SVP (i.e., the problem of finding shortest non-zero vectors in arbitrary lattices), which runs in 2๐‘›+๐‘œ(๐‘›) time. We therefore set aside the (perhaps impossible) goal of finding an efficient algorithm for โ„ค SVP and instead ask what else we can say about the problem. E.g., can we find any non-trivial speedup over the best known SVP algorithm? And, if โ„ค SVP actually is hard, then what consequences would follow? Our results are as follows. We show that โ„ค SVP is in a certain sense strictly easier than SVP on arbitrary lattices. In particular, we show how to reduce โ„ค SVP to an approximate version of SVP in the same dimension (in fact, even to approximate unique SVP, for any constant approximation factor). Such a reduction seems very unlikely to work for SVP itself, so we view this as a qualitative separation of โ„ค SVP from SVP. As a consequence of this reduction, we obtain a 2๐‘›/2+๐‘œ(๐‘›) -time algorithm for โ„ค SVP, i.e., the first non-trivial speedup over the best known algorithm for SVP on general lattices. (In fact, this reduction works for a more general class of latticesโ€”semi-stable lattices with not-too-large ๐œ†1 .) We show a simple public-key encryption scheme that is secure if (an appropriate variant of) โ„ค SVP is actually hard. Specifically, our scheme is secure if it is difficult to distinguish (in the worst case) a rotation of โ„ค๐‘› from either a lattice with all non-zero vectors longer than ๐‘›/log๐‘›โ€พโ€พโ€พโ€พโ€พโ€พโ€พโˆš or a lattice with smoothing parameter significantly smaller than the smoothing parameter of โ„ค๐‘› . The latter result has an interesting qualitative connection with reverse Minkowski theorems, which in some sense say that โ€œโ„ค๐‘› has the largest smoothing parameter.โ€ We show a distribution of bases ๐ for rotations of โ„ค๐‘› such that, if โ„ค SVP is hard for any input basis, then โ„ค SVP is hard on input ๐ . This gives a satisfying theoretical resolution to the problem of sampling hard bases for โ„ค๐‘› , which was studied by Blanks and Miller [9]. This worst-case to average-case reduction is also crucially used in the analysis of our encryption scheme. (In recent independent work that appeared as a preprint before this work, Ducas and van Woerden showed essentially the same thing for general lattices [15], and they also used this to analyze the security of a public-key encryption scheme. Similar ideas also appeared in [5, 11, 20] in different contexts.) We perform experiments to determine how practical basis reduction performs on bases of โ„ค๐‘› that are generated in different ways and how heuristic sieving algorithms perform on โ„ค๐‘› . Our basis reduction experiments complement and add to those performed by Blanks and Miller, as we work with a larger class of algorithms (i.e., larger block sizes) and study the โ€œprovably hardโ€ distribution of bases described above. Our sieving experiments confirm that heuristic sieving algorithms perform as expected on โ„ค๐‘› . 
    more » « less
  3. Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) Cooley-Tukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 pointsโ€”the first time a structured approach has done soโ€”with 4X faster inference speed and 40X fewer parameters. 
    more » « less
  4. Given a random n ร— n symmetric matrix ๐‘พ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form ๐’™^โŠค ๐‘พ ๐’™ over all vectors ๐’™ in a constraint set ๐’ฎ โŠ‚ โ„โฟ. For a certain class of normalized constraint sets we show that, conditional on a certain complexity-theoretic conjecture, no polynomial-time algorithm can certify a better upper bound than the largest eigenvalue of ๐‘พ. A notable special case included in our results is the hypercube ๐’ฎ = {ยฑ1/โˆšn}โฟ, which corresponds to the problem of certifying bounds on the Hamiltonian of the Sherrington-Kirkpatrick spin glass model from statistical physics. Our results suggest a striking gap between optimization and certification for this problem. Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negatively-spiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no low-degree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for predicting computational thresholds was proposed in a sequence of recent works on the sum-of-squares hierarchy, and is conjectured to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over ๐’™ โˆˆ ๐’ฎ is much larger than that of a GOE matrix. 
    more » « less
  5. We study the problem of estimating the value of sums of the form Spโ‰œโˆ‘(xip) when one has the ability to sample xiโ‰ฅ0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of p-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1ยฑฮต)-multiplicative approximation of Sp has query and time complexities O(mloglognฯต2S1/pp). Here, m=โˆ‘xi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365โ€“1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Spโ‰คn, and p=2, our upper bound is O~(n/S1/2p), in contrast to their ฮฉ(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is boundedโ€”or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is boundedโ€”we give a sublinear time algorithm via a reduction to the undirected case. 
    more » « less