skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00PM ET on Friday, December 15 until 2:00 AM ET on Saturday, December 16 due to maintenance. We apologize for the inconvenience.

Title: The Kikuchi Hierarchy and Tensor PCA
For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sumof-squares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms such as belief propagation and AMP (approximate message passing). Our level-ℓ algorithm can be thought of as a linearized message-passing algorithm that keeps track of ℓ-wise dependencies among the hidden variables. Specifically, our algorithms are spectral methods based on the Kikuchi Hessian, which generalizes the well-studied Bethe Hessian to the higher-order Kikuchi free energies. It is known that AMP, the flagship algorithm of statistical physics, has substantially worse performance than SOS for tensor PCA. In this work we ‘redeem’ the statistical physics approach by showing that our hierarchy gives a polynomial-time algorithm matching the performance of SOS. Our hierarchy also yields a continuum of subexponential-time algorithms, and we prove that these achieve the same (conjecturally optimal) tradeoff between runtime and statistical power as SOS. Our proofs are much simpler than prior work, and also apply to the related problem of refuting random k-XOR formulas. The results we present here apply to tensor PCA for tensors of all orders, and to k-XOR when k is even. Our methods suggest a new avenue for systematically obtaining optimal algorithms for Bayesian inference problems, and our results constitute a step toward unifying the statistical physics and sum-of-squares approaches to algorithm design.  more » « less
Award ID(s):
1712730 1838251
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
Page Range / eLocation ID:
1446 to 1468
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class of computationally-challenging inference tasks. In this work, we focus on the canonical task of clustering d-dimensional Gaussian mixtures with unknown (and possibly degenerate) covariance. Recent works (Ghosh et al. ’20; Mao, Wein ’21; Davis, Diaz, Wang ’21) have established lower bounds against the class of low-degree polynomial methods and the sum-of-squares (SoS) hierarchy for recovering certain hidden structures planted in Gaussian clustering instances. Prior work on many similar inference tasks portends that such lower bounds strongly suggest the presence of an inherent statistical-to-computational gap for clustering, that is, a parameter regime where the clustering task is statistically possible but no polynomial-time algorithm succeeds. One special case of the clustering task we consider is equivalent to the problem of finding a planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly, this particular clustering model does not exhibit a statistical-to-computational gap, despite the aforementioned low-degree and SoS lower bounds. To achieve this, we give an algorithm based on Lenstra–Lenstra–Lovász lattice basis reduction which achieves the statistically-optimal sample complexity of d + 1 samples. This result extends the class of problems whose conjectured statistical-to-computational gaps can be “closed” by “brittle” polynomial-time algorithms, highlighting the crucial but subtle role of noise in the onset of statistical-to-computational gaps. 
    more » « less
  2. null (Ed.)
    We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Such problems arise widely in the theory of random graphs, theoretical computer science, and statistical physics. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising p-spin glass model, and (b) finding a large independent set in a sparse Erdos-Renyi graph. Two families of algorithms are considered: (a) low-degree polynomials of the input-a general framework that captures methods such as approximate message passing and local algorithms on sparse graphs, among others; and (b) the Langevin dynamics algorithm, a canonical Monte Carlo analogue of the gradient descent algorithm (applicable only for the spherical p-spin glass Hamiltonian). We show that neither family of algorithms can produce nearly optimal solutions with high probability. Our proof uses the fact that both models are known to exhibit a variant of the overlap gap property (OGP) of near-optimal solutions. Specifically, for both models, every two solutions whose objective values are above a certain threshold are either close or far from each other. The crux of our proof is the stability of both algorithms: a small perturbation of the input induces a small perturbation of the output. By an interpolation argument, such a stable algorithm cannot overcome the OGP barrier. The stability of the Langevin dynamics is an immediate consequence of the well-posedness of stochastic differential equations. The stability of low-degree polynomials is established using concepts from Gaussian and Boolean Fourier analysis, including noise sensitivity, hypercontractivity, and total influence. 
    more » « less
  3. Low-rank matrix recovery problems involving high-dimensional and heterogeneous data appear in applications throughout statistics and machine learning. The contribution of this paper is to establish the fundamental limits of recovery for a broad class of these problems. In particular, we study the problem of estimating a rank-one matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum mean-squared error in estimating both the matrix and underlying factors. These results are based on a novel reduction from the low-rank matrix tensor product model (with homogeneous noise) to a rank-one model with heteroskedastic noise. As an application of our main result, we show that show recently proposed methods based on applying principal component analysis (PCA) to weighted combinations of the data are optimal in some settings but sub-optimal in others. We also provide numerical results comparing our asymptotic formulas with the performance of methods based weighted PCA, gradient descent, and approximate message passing. 
    more » « less
  4. null (Ed.)
    The Gilbert–Varshamov bound non-constructively establishes the existence of binary codes of distance 1/2−є/2 and rate Ω(є2). In a breakthrough result, Ta-Shma [STOC 2017] constructed the first explicit family of nearly optimal binary codes with distance 1/2−є/2 and rate Ω(є2+α), where α → 0 as є → 0. Moreover, the codes in Ta-Shma’s construction are є-balanced, where the distance between distinct codewords is not only bounded from below by 1/2−є/2, but also from above by 1/2+є/2. Polynomial time decoding algorithms for (a slight modification of) Ta-Shma’s codes appeared in [FOCS 2020], and were based on the Sum-of-Squares (SoS) semidefinite programming hierarchy. The running times for these algorithms were of the form NOα(1) for unique decoding, and NOє,α(1) for the setting of “gentle list decoding”, with large exponents of N even when α is a fixed constant. We derive new algorithms for both these tasks, running in time Õє(N). Our algorithms also apply to the general setting of decoding direct-sum codes. Our algorithms follow from new structural and algorithmic results for collections of k-tuples (ordered hypergraphs) possessing a “structured expansion” property, which we call splittability. This property was previously identified and used in the analysis of SoS-based decoding and constraint satisfaction algorithms, and is also known to be satisfied by Ta-Shma’s code construction. We obtain a new weak regularity decomposition for (possibly sparse) splittable collections W ⊆ [n]k, similar to the regularity decomposition for dense structures by Frieze and Kannan [FOCS 1996]. These decompositions are also computable in near-linear time Õ(|W |), and form a key component of our algorithmic results. 
    more » « less
  5. Abernethy, Jacob ; Agarwal, Shivani (Ed.)
    We study a variant of the sparse PCA (principal component analysis) problem in the “hard” regime, where the inference task is possible yet no polynomial-time algorithm is known to exist. Prior work, based on the low-degree likelihood ratio, has conjectured a precise expression for the best possible (sub-exponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the low-degree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worst-case initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime. 
    more » « less