On the Crucial Role of Initialization for Matrix Factorization

Li, B; Zhang, L; Mokhtari, A; He, N

Citation Details

This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known. Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models. Our approach, NoRA, i.e., LoRA with Nystrom initialization, demonstrates superior performance across various downstream tasks and model scales, from 1B to 7B parameters, in large language and diffusion models. more »

Award ID(s):: 2505865

PAR ID:: 10631539

Author(s) / Creator(s):: Li, B; Zhang, L; Mokhtari, A; He, N

Publisher / Repository:: https://doi.org/10.48550/arXiv.2410.18965

Date Published:: 2024-12-12

ISSN:: 2410.18965

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this