Search for: All records

Award ID contains: 2050360

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

2

Conference Proceeding

0

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

6

Citation Only

1

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions

https://doi.org/10.1137/22M1499819

Beaglehole, Daniel ; Belkin, Mikhail ; Pandit, Parthe ( December 2023 , SIAM Journal on Mathematics of Data Science)

Free, publicly-accessible full text available December 31, 2024
Wide and deep neural networks achieve consistency for classification

https://doi.org/10.1073/pnas.2208779120

Radhakrishnan, Adityanarayanan ; Belkin, Mikhail ; Uhler, Caroline ( April 2023 , Proceedings of the National Academy of Sciences)

While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful.
more » « less
Full Text Available
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

https://doi.org/10.1016/j.acha.2021.12.009

Liu, Chaoyue ; Zhu, Libin ; Belkin, Mikhail ( July 2022 , Applied and Computational Harmonic Analysis)

Full Text Available
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

https://doi.org/10.1017/S0962492921000039

Belkin, Mikhail ( May 2021 , Acta Numerica)
null (Ed.)
In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model. As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.
more » « less
Full Text Available
On the linearity of large non-linear models: when and why the tangent kernel is constant

Liu C., Zhu ( December 2020 , Advances in neural information processing systems)
null (Ed.)
Full Text Available
Overparameterized neural networks implement associative memory

https://doi.org/10.1073/PNAS.2005013117

Radhakrishnan, Adityanarayanan ; Belkin, Mikhail ; Uhler, Caroline ( November 2020 , Proceedings of the National Academy of Sciences)
null (Ed.)
Identifying computational mechanisms for memorization and retrieval of data is a long-standing problem at the intersection of machine learning and neuroscience. Our main finding is that standard overparameterized deep neural networks trained using standard optimization methods implement such a mechanism for real-valued data. We provide empirical evidence that 1) overparameterized autoencoders store training samples as attractors and thus iterating the learned map leads to sample recovery, and that 2) the same mechanism allows for encoding sequences of examples and serves as an even more efficient mechanism for memory than autoencoding. Theoretically, we prove that when trained on a single example, autoencoders store the example as an attractor. Lastly, by treating a sequence encoder as a composition of maps, we prove that sequence encoding provides a more efficient mechanism for memory than autoencoding.
more » « less
Full Text Available
Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks

Hui, L. ( October 2020 , The Ninth International Conference on Learning Representations (ICLR 2021))
null (Ed.)
Full Text Available