skip to main content

Search for: All records

Award ID contains: 2031883

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available July 29, 2023
  2. Free, publicly-accessible full text available July 1, 2023
  3. It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —non-linear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under- stood how neural networks routinely tackle high-dimensional datasets and adapt to latent low- dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the merged-staircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimension-free” dynamics approximation result that applies to functionsmore »defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.« less
  4. We consider the Ising perceptron model with N spins and M = N*alpha patterns, with a general activation function U that is bounded above. For U bounded away from zero, or U a one-sided threshold function, it was shown by Talagrand (2000, 2011) that for small densities alpha, the free energy of the model converges in the large-N limit to the replica symmetric formula conjectured in the physics literature (Krauth–Mezard 1989, see also Gardner–Derrida 1988). We give a new proof of this result, which covers the more general class of all functions U that are bounded above and satisfy a certain variance bound. The proof uses the (first and second) moment method conditional on the approximate message passing iterates of the model. In order to deduce our main theorem, we also prove a new concentration result for the perceptron model in the case where U is not bounded away from zero.
  5. We study the problem of learning Censored Markov Random Fields (abbreviated CMRFs), which are Markov Random Fields where some of the nodes are censored (i.e. not observed). We assume the CMRF is high temperature but, crucially, make no assumption about its structure. This makes structure learning impossible. Nevertheless we introduce a new definition, which we call learning to sample, that circumvents this obstacle. We give an algorithm that can learn to sample from a distribution within 𝜖𝑛 earthmover distance of the target distribution for any 𝜖>0. We obtain stronger results when we additionally assume high girth, as well as computational lower bounds showing that these are essentially optimal.