The training process of many deep networks explores the same low-dimensional manifold

Mao, Jialin; Griniasty, Itay; Teoh, Han Kheng; Ramesh, Rahul; Yang, Rubing; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik

doi:10.1073/pnas.2310002121

Citation Details

The training process of many deep networks explores the same low-dimensional manifold

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold. more »

Award ID(s):: 1753357

PAR ID:: 10631361

Author(s) / Creator(s):: Mao, Jialin; Griniasty, Itay; Teoh, Han Kheng; Ramesh, Rahul; Yang, Rubing; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik

Publisher / Repository:: Proceedings of the National Academy of Sciences

Date Published:: 2024-03-19

Journal Name:: Proceedings of the National Academy of Sciences

Volume:: 121

Issue:: 12

ISSN:: 0027-8424

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1073/pnas.2310002121

More Like this