Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

Ba, Jimmy; Erdogdu, Murat A; Suzuki, Taiji; Wang, Zhichao; Wu, Denny

Citation Details

We consider the problem of learning a single-index target function f∗ : Rd → R under the spiked covariance data: f∗(x) = σ∗ √ 1 1+θ ⟨x,μ⟩ , x ∼ N(0, Id + θμμ⊤), θ ≍ dβ for β ∈ [0, 1), where the link function σ∗ : R → R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of σ∗), and it depends on the projection of input x onto the spike (signal) direction μ ∈ Rd. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d → ∞, n/d → ψ ∈ (0,∞), we ask the following question: how large should the spike magnitude θ be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f∗? We show that for kernel ridge regression, β ≥ 1 − 1 p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, β > 1 − 1 k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k ≤ p by definition, neural networks can adapt to such structures more effectively. more »

Award ID(s):: 2154099

PAR ID:: 10540433

Author(s) / Creator(s):: Ba, Jimmy; Erdogdu, Murat A; Suzuki, Taiji; Wang, Zhichao; Wu, Denny

Editor(s):: Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S

Publisher / Repository:: Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

Date Published:: 2023-12-16

Volume:: 36

ISBN:: 9781713871088

Page Range / eLocation ID:: 20695--20728

Format(s):: Medium: X

Location:: New Orleans

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this