On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent

Azulay, Shahar; Moroshko, Edward; Nacson, MS; Woodworth, Blake; Srebro, Nathan; Globerson, Amir; Soudry, Daniel

Citation Details

Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, whereas small initialization leads to so called “rich regimes”. However, the initialization structure is richer than the overall scale alone and involves relative magnitudes of different weights and layers in the network. Here we show that these relative scales, which we refer to as initialization shape, play an important role in determining the learned model. We develop a novel technique for deriving the inductive bias of gradientflow and use it to obtain closed-form implicit regularizers for multiple cases of interest. more »

Award ID(s):: 1764032

PAR ID:: 10286846

Author(s) / Creator(s):: Azulay, Shahar; Moroshko, Edward; Nacson, MS; Woodworth, Blake; Srebro, Nathan; Globerson, Amir; Soudry, Daniel

Date Published:: 2021-07-01

Journal Name:: Proceedings of Machine Learning Research

Volume:: 139

ISSN:: 2640-3498

Page Range / eLocation ID:: 468 - 477

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this