Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available August 1, 2024

null (Ed.)Abstract We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomialtime algorithm (not necessarily gradientbased) for learning such functions under small noise implies a polynomialtime quantum algorithm for solving worstcase lattice problems, whose hardness form the foundation of latticebased cryptography. Our core hard family of functions, which are wellapproximated by onelayer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradientbased (Shamir’18), and Statistical Query (SQ) algorithms (Song et al.’17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomialtime algorithms, beyond gradientbased and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomialtime algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradientbased or an SQ algorithm, but is rather based on the celebrated LenstraLenstraLovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.’21) and phase retrieval with an optimal sample complexity of d + 1 samples. In the former case, this improves upon the quadraticind sample complexity required in (Bruna et al.’21).more » « less

null (Ed.)We consider the problem of finding a twolayer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of nonnegative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a wellcontrolled outer norm. Notably, our results (a) have a polynomial (in d) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector X∈ℝd need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fatshattering dimension}, a scalesensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in d with a low degree), and are in fact nearlinear for some important cases of interest.more » « less

Abernethy, Jacob ; Agarwal, Shivani (Ed.)We study a variant of the sparse PCA (principal component analysis) problem in the “hard” regime, where the inference task is possible yet no polynomialtime algorithm is known to exist. Prior work, based on the lowdegree likelihood ratio, has conjectured a precise expression for the best possible (subexponential) runtime throughout the hard regime. Following instead a statistical physics inspired point of view, we show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. These free energy wells imply hitting time lower bounds that corroborate the lowdegree conjecture: we show that a class of natural MCMC (Markov chain Monte Carlo) methods (with worstcase initialization) cannot solve sparse PCA with less than the conjectured runtime. These lower bounds apply to a wide range of values for two tuning parameters: temperature and sparsity misparametrization. Finally, we prove that the Overlap Gap Property (OGP), a structural property that implies failure of certain local search algorithms, holds in a significant part of the hard regime.more » « less

We consider the problem of estimating a $p$ dimensional vector $\beta$ from $n$ observations $Y=X\beta+W$ , where $\beta_{j}\mathop{\sim}^{\mathrm{i.i.d}.}\pi$ for a realvalued distribution $\pi$ with zero mean and unit variance’ $X_{ij}\mathop{\sim}^{\mathrm{i.i.d}.}\mathcal{N}(0,1)$ , and $W_{i}\mathop{\sim}^{\mathrm{i.i.d}.}\mathcal{N}(0,\ \sigma^{2})$ . In the asymptotic regime where $n/p\rightarrow\delta$ and $p/\sigma^{2}\rightarrow$ snr for two fixed constants $\delta,\ \mathsf{snr}\in(0,\ \infty)$ as $p\rightarrow\infty$ , the limiting (normalized) minimum meansquared error (MMSE) has been characterized by a singleletter (additive Gaussian scalar) channel. In this paper, we show that if the MMSE function of the singleletter channel converges to a step function, then the limiting MMSE of estimating $\beta$ converges to a step function which jumps from 1 to 0 at a critical threshold. Moreover, we establish that the limiting meansquared error of the (MSEoptimal) approximate message passing algorithm also converges to a step function with a larger threshold, providing evidence for the presence of a computationalstatistical gap between the two thresholds.more » « less