NysAct: A Scalable Preconditioned Gradient Descent using Nyström Approximation

Seung, Hyunseok; Lee, Jaewoo; Ko, Hyunsuk

doi:10.1109/BigData62323.2024.10825352

Citation Details

NysAct: A Scalable Preconditioned Gradient Descent using Nyström Approximation

Adaptive gradient methods are computationally efficient and converge quickly, but they often suffer from poor generalization. In contrast, second-order methods enhance convergence and generalization but typically incur high computational and memory costs. In this work, we introduce NYSACT, a scalable first-order gradient preconditioning method that strikes a balance between state-of-the-art first-order and second-order optimization methods. NYSACT leverages an eigenvalue-shifted Nyström method to approximate the activation covariance matrix, which is used as a preconditioning matrix, significantly reducing time and memory complexities with minimal impact on test accuracy. Our experiments show that NYSACT not only achieves improved test accuracy compared to both first-order and second-order methods but also demands considerably less computational resources than existing second-order methods. more »

Award ID(s):: 1943046

PAR ID:: 10567278

Author(s) / Creator(s):: Seung, Hyunseok; Lee, Jaewoo; Ko, Hyunsuk

Publisher / Repository:: IEEE

Date Published:: 2024-12-15

ISBN:: 979-8-3503-6248-0

Page Range / eLocation ID:: 1442 to 1449

Subject(s) / Keyword(s):: second-order optimization deep learning preconditioned sgd

Format(s):: Medium: X

Location:: Washington, DC, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData62323.2024.10825352

More Like this