How does Gradient Descent Learn Features---A Local Analysis for Regularized Two-Layer Neural Networks

Zhou, Mo; Ge, Rong

Citation Details

The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate the potential for neural networks to go beyond NTK regime and perform feature learning. Recently, a line of work highlighted the feature learning capabilities of the early stages of gradient-based training. In this paper we consider another mechanism for feature learning via gradient descent through a local convergence analysis. We show that once the loss is below a certain threshold, gradient descent with a carefully regularized objective will capture ground-truth directions. We further strengthen this local convergence analysis by incorporating early-stage feature learning analysis. Our results demonstrate that feature learning not only happens at the initial gradient steps, but can also occur towards the end of training. more »

Award ID(s):: 2031849

PAR ID:: 10627701

Author(s) / Creator(s):: Zhou, Mo; Ge, Rong

Publisher / Repository:: NeurIPS 2024

Date Published:: 2024-12-15

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this