NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The training process of many deep networks explores the same low-dimensional manifold

https://doi.org/10.1073/pnas.2310002121

Mao, Jialin; Griniasty, Itay; Teoh, Han Kheng; Ramesh, Rahul; Yang, Rubing; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (March 2024, Proceedings of the National Academy of Sciences)

We develop information-geometric techniques to analyze the trajectories of the predictions of deep networks during training. By examining the underlying high-dimensional probabilistic models, we reveal that the training process explores an effectively low-dimensional manifold. Networks with a wide range of architectures, sizes, trained using different optimization methods, regularization techniques, data augmentation techniques, and weight initializations lie on the same manifold in the prediction space. We study the details of this manifold to find that networks with different architectures follow distinguishable trajectories, but other factors have a minimal influence; larger networks train along a similar manifold as that of smaller networks, just faster; and networks initialized at very different parts of the prediction space converge to the solution along a similar manifold.
more » « less
Full Text Available
Sloppy model analysis identifies bifurcation parameters without normal form analysis

https://doi.org/10.1103/PhysRevE.108.064215

Anderson, Christian_N K; Transtrum, Mark K (December 2023, Physical Review E)

Full Text Available
A Picture of the Space of Typical Learnable Tasks

Ramesh, Rahul; Mao, Jialin; Griniasty, Itay; Yang, Rubing; Teoh, Han Kheng; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (May 2023, Proceedings of the 40 th International Conference on Machine Learning)

Full Text Available
A Picture of the Space of Typical Learnable Tasks

Ramesh, Rahul; Mao, Jialin; Griniasty, Itay; Yan, Rubing; Teoh, Han Kheng; Transtrum, Mark K; Sethna, James P; Chaudhari, Pratik (May 2023, Proceedings of the 40th International Conference on Machine Learning)

We develop information geometric techniques to understand the representations learned by deep networks when they are trained on different tasks using supervised, meta-, semi-supervised and con- trastive learning. We shed light on the following phenomena that relate to the structure of the space of tasks: (1) the manifold of probabilistic models trained on different tasks using different represen- tation learning methods is effectively low-dimen- sional; (2) supervised learning on one task results in a surprising amount of progress even on seem- ingly dissimilar tasks; progress on other tasks is larger if the training task has diverse classes; (3) the structure of the space of tasks indicated by our analysis is consistent with parts of the Word- net phylogenetic tree; (4) episodic meta-learning algorithms and supervised learning traverse differ- ent trajectories during training but they fit similar models eventually; (5) contrastive and semi-su- pervised learning methods traverse trajectories similar to those of supervised learning. We use classification tasks constructed from the CIFAR- 10 and Imagenet datasets to study these phenom- ena. Code is available at https://github.com/grasp- lyrl/picture of space of tasks.
more » « less
Full Text Available
Information geometry for multiparameter models: new perspectives on the origin of simplicity

https://doi.org/10.1088/1361-6633/aca6f8

Quinn, Katherine N; Abbott, Michael C; Transtrum, Mark K; Machta, Benjamin B; Sethna, James P (December 2022, Reports on Progress in Physics)

Abstract Complex models in physics, biology, economics, and engineering are often sloppy , meaning that the model parameters are not well determined by the model predictions for collective behavior. Many parameter combinations can vary over decades without significant changes in the predictions. This review uses information geometry to explore sloppiness and its deep relation to emergent theories. We introduce the model manifold of predictions, whose coordinates are the model parameters. Its hyperribbon structure explains why only a few parameter combinations matter for the behavior. We review recent rigorous results that connect the hierarchy of hyperribbon widths to approximation theory, and to the smoothness of model predictions under changes of the control variables. We discuss recent geodesic methods to find simpler models on nearby boundaries of the model manifold—emergent theories with fewer parameters that explain the behavior equally well. We discuss a Bayesian prior which optimizes the mutual information between model parameters and experimental data, naturally favoring points on the emergent boundary theories and thus simpler models. We introduce a ‘projected maximum likelihood’ prior that efficiently approximates this optimal prior, and contrast both to the poor behavior of the traditional Jeffreys prior. We discuss the way the renormalization group coarse-graining in statistical mechanics introduces a flow of the model manifold, and connect stiff and sloppy directions along the model manifold with relevant and irrelevant eigendirections of the renormalization group. Finally, we discuss recently developed ‘intensive’ embedding methods, allowing one to visualize the predictions of arbitrary probabilistic models as low-dimensional projections of an isometric embedding, and illustrate our method by generating the model manifold of the Ising model.
more » « less
Full Text Available
Selecting simple, transferable models with the supremum principle

https://doi.org/10.1103/PhysRevResearch.4.L032044

Petrie, Cody; Anderson, Christian; Maekawa, Casie; Maekawa, Travis; Transtrum, Mark K. (September 2022, Physical Review Research)

Full Text Available
Piecemeal Reduction of Models of Large Networks

https://doi.org/10.1109/CDC45484.2021.9683471

Francis, Benjamin L.; Transtrum, Mark K.; Saric, Andrija T.; Stankovic, Aleksandar M. (December 2021, Conference on Decision and Control)

Full Text Available

Search for: All records