Asymptotic Dynamics for Delayed Feature Learning in a Toy Model

Bordelon, Blake; Kumar, Tanishq; Gershman, Samuel J; Pehlevan, Cengiz

Citation Details

We consider a toy model that exhibits grokking, recently advanced by [Kumar et al, 2023], and take advantage of the simple setting to derive the dynamics of the train and test loss using Dynamical Mean Field Theory (DMFT). This gives a closed-form expression for the gap between train and test loss that characterizes grokking in this toy model, illustrating how two parameters of interest -- NTK alignment and network laziness -- control the size of this gap and how grokking emerges as a uniquely offline property during repeated training over the same dataset. This is the first quantitative characterization of grokking dynamics in a general setting that makes no assumptions about weight decay, weight norm, etc. more »

Award ID(s):: 2239780 2134157

PAR ID:: 10540410

Author(s) / Creator(s):: Bordelon, Blake; Kumar, Tanishq; Gershman, Samuel J; Pehlevan, Cengiz

Publisher / Repository:: High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning at ICML 2024

Date Published:: 2024-06-16

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this