Clustering is a fundamental primitive in unsupervised learning which gives rise to a rich class
of computationallychallenging inference tasks. In this work, we focus on the canonical task of
clustering ddimensional Gaussian mixtures with unknown (and possibly degenerate) covariance.
Recent works (Ghosh et al. ’20; Mao, Wein ’21; Davis, Diaz, Wang ’21) have established lower
bounds against the class of lowdegree polynomial methods and the sumofsquares (SoS) hierarchy
for recovering certain hidden structures planted in Gaussian clustering instances. Prior work
on many similar inference tasks portends that such lower bounds strongly suggest the presence
of an inherent statisticaltocomputational gap for clustering, that is, a parameter regime where
the clustering task is statistically possible but no polynomialtime algorithm succeeds.
One special case of the clustering task we consider is equivalent to the problem of finding a
planted hypercube vector in an otherwise random subspace. We show that, perhaps surprisingly,
this particular clustering model does not exhibit a statisticaltocomputational gap, despite the
aforementioned lowdegree and SoS lower bounds. To achieve this, we give an algorithm based
on Lenstra–Lenstra–Lovász lattice basis reduction which achieves the statisticallyoptimal sample
complexity of d + 1 samples. This result extends the class of problems whose conjectured
statisticaltocomputational gaps can be “closed” by “brittle” polynomialtime algorithms, highlighting
the crucial but subtle role of noise in the onset of statisticaltocomputational gaps.
more »
« less
The Kikuchi Hierarchy and Tensor PCA
For the tensor PCA (principal component analysis) problem, we propose a new hierarchy of
increasingly powerful algorithms with increasing runtime. Our hierarchy is analogous to the sumofsquares (SOS) hierarchy but is instead inspired by statistical physics and related algorithms
such as belief propagation and AMP (approximate message passing). Our levelℓ algorithm can
be thought of as a linearized messagepassing algorithm that keeps track of ℓwise dependencies
among the hidden variables. Specifically, our algorithms are spectral methods based on the
Kikuchi Hessian, which generalizes the wellstudied Bethe Hessian to the higherorder Kikuchi
free energies.
It is known that AMP, the flagship algorithm of statistical physics, has substantially worse
performance than SOS for tensor PCA. In this work we ‘redeem’ the statistical physics approach
by showing that our hierarchy gives a polynomialtime algorithm matching the performance of
SOS. Our hierarchy also yields a continuum of subexponentialtime algorithms, and we prove
that these achieve the same (conjecturally optimal) tradeoff between runtime and statistical
power as SOS. Our proofs are much simpler than prior work, and also apply to the related
problem of refuting random kXOR formulas. The results we present here apply to tensor PCA
for tensors of all orders, and to kXOR when k is even.
Our methods suggest a new avenue for systematically obtaining optimal algorithms for
Bayesian inference problems, and our results constitute a step toward unifying the statistical
physics and sumofsquares approaches to algorithm design.
more »
« less
 NSFPAR ID:
 10164516
 Date Published:
 Journal Name:
 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS)
 Page Range / eLocation ID:
 1446 to 1468
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


null (Ed.)We consider the problem of finding nearly optimal solutions of optimization problems with random objective functions. Such problems arise widely in the theory of random graphs, theoretical computer science, and statistical physics. Two concrete problems we consider are (a) optimizing the Hamiltonian of a spherical or Ising pspin glass model, and (b) finding a large independent set in a sparse ErdosRenyi graph. Two families of algorithms are considered: (a) lowdegree polynomials of the inputa general framework that captures methods such as approximate message passing and local algorithms on sparse graphs, among others; and (b) the Langevin dynamics algorithm, a canonical Monte Carlo analogue of the gradient descent algorithm (applicable only for the spherical pspin glass Hamiltonian). We show that neither family of algorithms can produce nearly optimal solutions with high probability. Our proof uses the fact that both models are known to exhibit a variant of the overlap gap property (OGP) of nearoptimal solutions. Specifically, for both models, every two solutions whose objective values are above a certain threshold are either close or far from each other. The crux of our proof is the stability of both algorithms: a small perturbation of the input induces a small perturbation of the output. By an interpolation argument, such a stable algorithm cannot overcome the OGP barrier. The stability of the Langevin dynamics is an immediate consequence of the wellposedness of stochastic differential equations. The stability of lowdegree polynomials is established using concepts from Gaussian and Boolean Fourier analysis, including noise sensitivity, hypercontractivity, and total influence.more » « less

Lowrank matrix recovery problems involving highdimensional and heterogeneous data appear in applications throughout statistics and machine learning. The contribution of this paper is to establish the fundamental limits of recovery for a broad class of these problems. In particular, we study the problem of estimating a rankone matrix from Gaussian observations where different blocks of the matrix are observed under different noise levels. In the setting where the number of blocks is fixed while the number of variables tends to infinity, we prove asymptotically exact formulas for the minimum meansquared error in estimating both the matrix and underlying factors. These results are based on a novel reduction from the lowrank matrix tensor product model (with homogeneous noise) to a rankone model with heteroskedastic noise. As an application of our main result, we show that show recently proposed methods based on applying principal component analysis (PCA) to weighted combinations of the data are optimal in some settings but suboptimal in others. We also provide numerical results comparing our asymptotic formulas with the performance of methods based weighted PCA, gradient descent, and approximate message passing.more » « less

null (Ed.)The Gilbert–Varshamov bound nonconstructively establishes the existence of binary codes of distance 1/2−є/2 and rate Ω(є2). In a breakthrough result, TaShma [STOC 2017] constructed the first explicit family of nearly optimal binary codes with distance 1/2−є/2 and rate Ω(є2+α), where α → 0 as є → 0. Moreover, the codes in TaShma’s construction are єbalanced, where the distance between distinct codewords is not only bounded from below by 1/2−є/2, but also from above by 1/2+є/2. Polynomial time decoding algorithms for (a slight modification of) TaShma’s codes appeared in [FOCS 2020], and were based on the SumofSquares (SoS) semidefinite programming hierarchy. The running times for these algorithms were of the form NOα(1) for unique decoding, and NOє,α(1) for the setting of “gentle list decoding”, with large exponents of N even when α is a fixed constant. We derive new algorithms for both these tasks, running in time Õє(N). Our algorithms also apply to the general setting of decoding directsum codes. Our algorithms follow from new structural and algorithmic results for collections of ktuples (ordered hypergraphs) possessing a “structured expansion” property, which we call splittability. This property was previously identified and used in the analysis of SoSbased decoding and constraint satisfaction algorithms, and is also known to be satisfied by TaShma’s code construction. We obtain a new weak regularity decomposition for (possibly sparse) splittable collections W ⊆ [n]k, similar to the regularity decomposition for dense structures by Frieze and Kannan [FOCS 1996]. These decompositions are also computable in nearlinear time Õ(W ), and form a key component of our algorithmic results.more » « less

Abernethy, Jacob ; Agarwal, Shivani (Ed.)We study a variant of the sparse PCA (principal component analysis) problem in the “hard” regime, where the inference task is possible yet no polynomialtime algorithm is known to exist. Prior work, based on the lowdegree likelihood ratio, has conjectured a precise expression for the best possible (subexponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the lowdegree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worstcase initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.more » « less