Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Free, publiclyaccessible full text available July 29, 2023

Free, publiclyaccessible full text available July 1, 2023

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —nonlinear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth2 neural networks trained by SGD in the meanfield regime. We consider functions on binary inputs that depend on a latent lowdimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under stood how neural networks routinely tackle highdimensional datasets and adapt to latent low dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGDlearnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the mergedstaircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that nonlinear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimensionfree” dynamics approximation result that applies to functionsmore »

We consider the Ising perceptron model with N spins and M = N*alpha patterns, with a general activation function U that is bounded above. For U bounded away from zero, or U a onesided threshold function, it was shown by Talagrand (2000, 2011) that for small densities alpha, the free energy of the model converges in the largeN limit to the replica symmetric formula conjectured in the physics literature (Krauth–Mezard 1989, see also Gardner–Derrida 1988). We give a new proof of this result, which covers the more general class of all functions U that are bounded above and satisfy a certain variance bound. The proof uses the (first and second) moment method conditional on the approximate message passing iterates of the model. In order to deduce our main theorem, we also prove a new concentration result for the perceptron model in the case where U is not bounded away from zero.

We study the problem of learning Censored Markov Random Fields (abbreviated CMRFs), which are Markov Random Fields where some of the nodes are censored (i.e. not observed). We assume the CMRF is high temperature but, crucially, make no assumption about its structure. This makes structure learning impossible. Nevertheless we introduce a new definition, which we call learning to sample, that circumvents this obstacle. We give an algorithm that can learn to sample from a distribution within 𝜖𝑛 earthmover distance of the target distribution for any 𝜖>0. We obtain stronger results when we additionally assume high girth, as well as computational lower bounds showing that these are essentially optimal.