The Impact of Initialization on LoRA Finetuning Dynamics

Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin

Citation Details

In this paper, we study the role of initialization in Low Rank Adaptation (LoRA) as originally introduced in Hu et al. [19]. Essentially, to start from the pretrained model as initialization for finetuning, one can either initialize B to zero and A to random (default initialization in PEFT package), or vice-versa. In both cases, the product BA is equal to zero at initialization, which makes finetuning starts from the pretrained model. These two initialization schemes are seemingly sim- ilar. They should in-principle yield the same performance and share the same optimal learning rate. We demonstrate that this is an incorrect intuition and that the first scheme (initializing B to zero and A to random) on average yields better performance compared to the other scheme. Our theoretical analysis shows that the reason behind this might be that the first initialization allows the use of larger learning rates (without causing output instability) compared to the second initial- ization, resulting in more efficient learning of the first scheme. We validate our results with extensive experiments on LLMs. more »

Award ID(s):: 2209975 2023505 2015341 2031883

PAR ID:: 10635700

Author(s) / Creator(s):: Hayou, Soufiane; Ghosh, Nikhil; Yu, Bin

Publisher / Repository:: NeurIPS 2024

Date Published:: 2024-09-25

ISBN:: 979-8-3313-1438-5

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this