We give algorithms with lower arithmetic operation counts for both the WalshHadamard Transform (WHT) and the Discrete Fourier Transform (DFT) on inputs of powerof2 size N.
For the WHT, our new algorithm has an operation count of 23/24N logN + O(N). To our knowledge, this gives the first improvement on the N logN operation count of the simple, folklore Fast WalshHadamard Transform algorithm.
For the DFT, our new FFT algorithm uses 15/4N logN + O(N) real arithmetic operations. Our leading constant 15/4 = 3.75 improves on the leading constant of 5 from the CooleyTukey algorithm from 1965, leading constant 4 from the splitradix algorithm of Yavne from 1968, leading constant 34/9=3.7777 from a modification of the splitradix algorithm by Van Buskirk from 2004, and leading constant 3.76875 from a theoretically optimized version of Van Buskirkโs algorithm by Sergeev from 2017.
Our new WHT algorithm takes advantage of a recent line of work on the nonrigidity of the WHT: we decompose the WHT matrix as the sum of a lowrank matrix and a sparse matrix, and then analyze the structures of these matrices to achieve a lower operation count. Our new DFT algorithm comes from a novel reduction, showing that parts of the previous best FFT algorithms can be replaced by calls to an algorithm for the WHT. Replacing the folklore WHT algorithm with our new improved algorithm leads to our improved FFT.
more »
« less
Sparse Recovery for Orthogonal Polynomial Transforms
In this paper we consider the following sparse recovery problem. We have query access to a vector ๐ฑ โ โ^N such that xฬ = ๐
๐ฑ is ksparse (or nearly ksparse) for some orthogonal transform ๐
. The goal is to output an approximation (in an ๐โ sense) to xฬ in sublinear time. This problem has been wellstudied in the special case that ๐
is the Discrete Fourier Transform (DFT), and a long line of work has resulted in sparse Fast Fourier Transforms that run in time O(k โ
polylog N). However, for transforms ๐
other than the DFT (or closely related transforms like the Discrete Cosine Transform), the question is much less settled. In this paper we give sublineartime algorithms  running in time poly(k log(N))  for solving the sparse recovery problem for orthogonal transforms ๐
that arise from orthogonal polynomials. More precisely, our algorithm works for any ๐
that is an orthogonal polynomial transform derived from Jacobi polynomials. The Jacobi polynomials are a large class of classical orthogonal polynomials (and include Chebyshev and Legendre polynomials as special cases), and show up extensively in applications like numerical analysis and signal processing. One caveat of our work is that we require an assumption on the sparsity structure of the sparse vector, although we note that vectors with random support have this property with high probability. Our approach is to give a very general reduction from the ksparse sparse recovery problem to the 1sparse sparse recovery problem that holds for any flat orthogonal polynomial transform; then we solve this onesparse recovery problem for transforms derived from Jacobi polynomials. Frequently, sparse FFT algorithms are described as implementing such a reduction; however, the technical details of such works are quite specific to the Fourier transform and moreover the actual implementations of these algorithms do not use the 1sparse algorithm as a black box. In this work we give a reduction that works for a broad class of orthogonal polynomial families, and which uses any 1sparse recovery algorithm as a black box.
more »
« less
 Award ID(s):
 1763481
 NSFPAR ID:
 10214644
 Date Published:
 Journal Name:
 Leibniz international proceedings in informatics
 Volume:
 168
 ISSN:
 18688969
 Page Range / eLocation ID:
 58:158:16
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Hazay, Carmit ; Stam, Martijn (Ed.)We study the computational problem of finding a shortest nonzero vector in a rotation of โค๐ , which we call โค SVP. It has been a longstanding open problem to determine if a polynomialtime algorithm for โค SVP exists, and there is by now a beautiful line of work showing how to solve it efficiently in certain very special cases. However, despite all of this work, the fastest known algorithm that is proven to solve โค SVP is still simply the fastest known algorithm for solving SVP (i.e., the problem of finding shortest nonzero vectors in arbitrary lattices), which runs in 2๐+๐(๐) time. We therefore set aside the (perhaps impossible) goal of finding an efficient algorithm for โค SVP and instead ask what else we can say about the problem. E.g., can we find any nontrivial speedup over the best known SVP algorithm? And, if โค SVP actually is hard, then what consequences would follow? Our results are as follows. We show that โค SVP is in a certain sense strictly easier than SVP on arbitrary lattices. In particular, we show how to reduce โค SVP to an approximate version of SVP in the same dimension (in fact, even to approximate unique SVP, for any constant approximation factor). Such a reduction seems very unlikely to work for SVP itself, so we view this as a qualitative separation of โค SVP from SVP. As a consequence of this reduction, we obtain a 2๐/2+๐(๐) time algorithm for โค SVP, i.e., the first nontrivial speedup over the best known algorithm for SVP on general lattices. (In fact, this reduction works for a more general class of latticesโsemistable lattices with nottoolarge ๐1 .) We show a simple publickey encryption scheme that is secure if (an appropriate variant of) โค SVP is actually hard. Specifically, our scheme is secure if it is difficult to distinguish (in the worst case) a rotation of โค๐ from either a lattice with all nonzero vectors longer than ๐/log๐โพโพโพโพโพโพโพโ or a lattice with smoothing parameter significantly smaller than the smoothing parameter of โค๐ . The latter result has an interesting qualitative connection with reverse Minkowski theorems, which in some sense say that โโค๐ has the largest smoothing parameter.โ We show a distribution of bases ๐ for rotations of โค๐ such that, if โค SVP is hard for any input basis, then โค SVP is hard on input ๐ . This gives a satisfying theoretical resolution to the problem of sampling hard bases for โค๐ , which was studied by Blanks and Miller [9]. This worstcase to averagecase reduction is also crucially used in the analysis of our encryption scheme. (In recent independent work that appeared as a preprint before this work, Ducas and van Woerden showed essentially the same thing for general lattices [15], and they also used this to analyze the security of a publickey encryption scheme. Similar ideas also appeared in [5, 11, 20] in different contexts.) We perform experiments to determine how practical basis reduction performs on bases of โค๐ that are generated in different ways and how heuristic sieving algorithms perform on โค๐ . Our basis reduction experiments complement and add to those performed by Blanks and Miller, as we work with a larger class of algorithms (i.e., larger block sizes) and study the โprovably hardโ distribution of bases described above. Our sieving experiments confirm that heuristic sieving algorithms perform as expected on โค๐ .more » « less

Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrixvector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent handcrafting these algorithms and implementations is necessary, what structural prior they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrixvector multiplication as products of sparse matrices, we introduce a parameterization of divideandconquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the O(N logN) CooleyTukey FFT algorithm to machine precision, for dimensions N up to 1024. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hiddenlayer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR10 by 3.9 pointsโthe first time a structured approach has done soโwith 4X faster inference speed and 40X fewer parameters.more » « less

Given a random n ร n symmetric matrix ๐พ drawn from the Gaussian orthogonal ensemble (GOE), we consider the problem of certifying an upper bound on the maximum value of the quadratic form ๐^โค ๐พ ๐ over all vectors ๐ in a constraint set ๐ฎ โ โโฟ. For a certain class of normalized constraint sets we show that, conditional on a certain complexitytheoretic conjecture, no polynomialtime algorithm can certify a better upper bound than the largest eigenvalue of ๐พ. A notable special case included in our results is the hypercube ๐ฎ = {ยฑ1/โn}โฟ, which corresponds to the problem of certifying bounds on the Hamiltonian of the SherringtonKirkpatrick spin glass model from statistical physics. Our results suggest a striking gap between optimization and certification for this problem. Our proof proceeds in two steps. First, we give a reduction from the detection problem in the negativelyspiked Wishart model to the above certification problem. We then give evidence that this Wishart detection problem is computationally hard below the classical spectral threshold, by showing that no lowdegree polynomial can (in expectation) distinguish the spiked and unspiked models. This method for predicting computational thresholds was proposed in a sequence of recent works on the sumofsquares hierarchy, and is conjectured to be correct for a large class of problems. Our proof can be seen as constructing a distribution over symmetric matrices that appears computationally indistinguishable from the GOE, yet is supported on matrices whose maximum quadratic form over ๐ โ ๐ฎ is much larger than that of a GOE matrix.more » « less

We study the problem of estimating the value of sums of the form Spโโ(xip) when one has the ability to sample xiโฅ0 with probability proportional to its magnitude. When p=2, this problem is equivalent to estimating the selectivity of a selfjoin query in database systems when one can sample rows randomly. We also study the special case when {xi} is the degree sequence of a graph, which corresponds to counting the number of pstars in a graph when one has the ability to sample edges randomly. Our algorithm for a (1ยฑฮต)multiplicative approximation of Sp has query and time complexities O(mloglognฯต2S1/pp). Here, m=โxi/2 is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, n is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when {xi} is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365โ1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where Spโคn, and p=2, our upper bound is O~(n/S1/2p), in contrast to their ฮฉ(n/S1/3p) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between indegree and outdegree is boundedโor equivalently, when the ratio between the number of occurrences of values in the two columns being joined is boundedโwe give a sublinear time algorithm via a reduction to the undirected case.more » « less