NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning curves for deep structured Gaussian feature models

https://doi.org/10.1088/1742-5468/ad642a

Zavatone-Veth, Jacob A; Pehlevan, Cengiz (October 2024, Journal of Statistical Mechanics: Theory and Experiment)

Abstract In recent years, significant attention in deep learning theory has been devoted to analyzing when models that interpolate their training data can still generalize well to unseen examples. Many insights have been gained from studying models with multiple layers of Gaussian random features, for which one can compute precise generalization asymptotics. However, few works have considered the effect of weight anisotropy; most assume that the random features are generated using independent and identically distributed Gaussian weights, and allow only for structure in the input data. Here, we use the replica trick from statistical physics to derive learning curves for models with many layers of structured Gaussian features. We show that allowing correlations between the rows of the first layer of features can aid generalization, while structure in later layers is generally detrimental. Our results shed light on how weight structure affects generalization in a simple class of solvable models.
more » « less
Full Text Available
Long sequence Hopfield memory

https://doi.org/10.1088/1742-5468/ad6427

Tahir_Chaudhry, Hamza; Zavatone-Veth, Jacob A; Krotov, Dmitry; Pehlevan, Cengiz (October 2024, Journal of Statistical Mechanics: Theory and Experiment)

Abstract Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions. Computational models of sequence memory have been proposed where recurrent Hopfield-like neural networks are trained with temporally asymmetric Hebbian rules. However, these networks suffer from limited sequence capacity (maximal length of the stored sequence) due to interference between the memories. Inspired by recent work on Dense Associative Memories, we expand the sequence capacity of these models by introducing a nonlinear interaction term, enhancing separation between the patterns. We derive novel scaling laws for sequence capacity with respect to network size, significantly outperforming existing scaling laws for models based on traditional Hopfield networks, and verify these theoretical results with numerical simulation. Moreover, we introduce a generalized pseudoinverse rule to recall sequences of highly correlated patterns. Finally, we extend this model to store sequences with variable timing between states’ transitions and describe a biologically-plausible implementation, with connections to motor neuroscience.
more » « less
Full Text Available
Dynamics of finite width Kernel and prediction fluctuations in mean field neural networks

https://doi.org/10.1088/1742-5468/ad642b

Bordelon, Blake; Pehlevan, Cengiz (October 2024, Journal of Statistical Mechanics: Theory and Experiment)

Abstract We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the $O (1 / \sqrt{width})$ fluctuations of the dynamical mean field theory order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. We find that once the mean field/µP parameterization is adopted, the leading finite size effect on the dynamics is to introduce initialization variance in the predictions and feature kernels of the networks. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently. In two layer networks, we show how feature learning can dynamically reduce the variance of the final tangent kernel and final network predictions. We also show how initialization variance can slow down online learning in wide but finite networks. In deeper networks, kernel variance can dramatically accumulate through subsequent layers at large feature learning strengths, but feature learning continues to improve the signal-to-noise ratio of the feature kernels. In discrete time, we demonstrate that large learning rate phenomena such as edge of stability effects can be well captured by infinite width dynamics and that initialization variance can decrease dynamically. For convolutional neural networks trained on CIFAR-10, we empirically find significant corrections to both the bias and variance of network dynamics due to finite width.
more » « less
Full Text Available
Self-consistent dynamical field theory of kernel evolution in wide neural networks ^*

https://doi.org/10.1088/1742-5468/ad01b0

Bordelon, Blake; Pehlevan, Cengiz (November 2023, Journal of Statistical Mechanics: Theory and Experiment)

Abstract We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points, providing a reduced description of network activity through training. These kernel order parameters collectively define the hidden layer activation distribution, the evolution of the neural tangent kernel (NTK), and consequently, output predictions. We show that the field theory derivation recovers the recursive stochastic process of infinite-width feature learning networks obtained by Yang and Hu with tensor programs. For deep linear networks, these kernels satisfy a set of algebraic matrix equations. For nonlinear networks, we provide an alternating sampling procedure to self-consistently solve for the kernel order parameters. We provide comparisons of the self-consistent solution to various approximation schemes including the static NTK approximation, gradient independence assumption, and leading order perturbation theory, showing that each of these approximations can break down in regimes where general self-consistent solutions still provide an accurate description. Lastly, we provide experiments in more realistic settings which demonstrate that the loss and kernel dynamics of convolutional neural networks at fixed feature learning strength are preserved across different widths on a image classification task.
more » « less
Full Text Available
SOAP: Improving and Stabilizing Shampoo using Adam

Vyas, Nikhil; Morwani, Depen; Zhao, Rosie; Shapira, Itai; Brandfonbrener, David; Janson, Lucas; Kakade, Sham (December 2024, Neural Information Processing Systems Workshop: Optimization for Machine Learning)

Free, publicly-accessible full text available December 15, 2025
Transformers can reinforcement learn to approximate Gittins Index

Petrov, Vladimir; Vyas, Nikhil; Janson, Lucas (October 2024, Conference on Neural Information Processing Systems Workshop: Scientific Methods for Understanding Deep Learning)

Full Text Available
In-Context Learning by Linear Attention: Exact Asymptotics and Experiments

Lu, Yue; Letey, Mary; Zavatone-Veth, Jacob A; Maiti, Anindita; Pehlevan, Cengiz (October 2024, NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L), 2024)

Full Text Available
Partial observation can induce mechanistic mismatches in data-constrained RNNs

Qian, William; Zavatone-Veth, Jacob A; Ruben, Benjamin S; Pehlevan, Cengiz (October 2024, The First Workshop on NeuroAI @ NeurIPS2024)

Full Text Available
Optimal ablation for interpretability

Li, Maximilian; Janson, Lucas (September 2024, NeurIPS 2024)

Full Text Available
Partial observation can induce mechanistic mismatches in data-constrained models of neural dynamics

Qian, William; Zavatone-Veth, Jacob A; Ruben, Benjamin S; Pehlevan, C (September 2024, Advanced in Neural Information Processing Systems (NeurIPS) 2024)

Full Text Available

« Prev Next »

Search for: All records