SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks

Galanti, Tomer; Poggio, Tomaso

Citation Details

We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight decay. We prove that the source of the SGD noise is an implicit low rank constraint across all of the weight matrices within the network. Furthermore, we show, both theoretically and empirically, that when training a neural network using Stochastic Gradient Descent (SGD) with a small batch size, the resulting weight matrices are expected to be of small rank. Our analysis relies on a minimal set of assumptions and the neural networks may include convolutional layers, residual connections, as well as batch normalization layers. more »

Award ID(s):: 2134108

PAR ID:: 10565469

Author(s) / Creator(s):: Galanti, Tomer; Poggio, Tomaso

Publisher / Repository:: Center for Brains, Minds and Machines (CBMM)

Date Published:: 2022-03-28

Format(s):: Medium: X

Institution:: Massachusetts Institute of Technology

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Posted Content:
The DOI is not currently available.

More Like this