NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (October 2023, Proceedings of Machine Learning Research)

Heavy-tail phenomena in stochastic gradient de- scent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenom- ena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy- tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions.
more » « less
Full Text Available
A Stochastic Subgradient Method for Distributionally Robust Non-convex and Non-smooth Learning

https://doi.org/10.1007/s10957-022-02063-6

Gürbüzbalaban, Mert; Ruszczyński, Andrzej; Zhu, Landi (September 2022, Journal of Optimization Theory and Applications)

Full Text Available
Differentially Private Accelerated Optimization Algorithms

https://doi.org/10.1137/20M1355847

Kuru, Nurdan; İlker Birbil, Ş.; Gürbüzbalaban, Mert; Yildirim, Sinan (June 2022, SIAM Journal on Optimization)

Full Text Available
Randomized Gossiping with Effective Resistance Weights: Performance Guarantees and Applications

https://doi.org/10.1109/TCNS.2022.3161201

Can, Bugra; Gurbuzbalaban, Mert; Aybat, Necdet Serhat; Soori, Saeed; Mehri Dehnavi, Maryam (January 2022, IEEE Transactions on Control of Network Systems)

Full Text Available
Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

https://doi.org/10.1109/TIT.2022.3213607

Dixit, Rishabh; Gurbuzbalaban, Mert; Bajwa, Waheed U. (January 2022, IEEE Transactions on Information Theory)

Full Text Available
Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

Fallah, Alireza; Gurbuzbalaban, Mert; Ozdaglar, Asuman; Simsekli, Umut; Zhu Lingjiong (January 2022, Journal of machine learning research)

Full Text Available
HyLo: a hybrid low-rank natural gradient descent method

Mu, Baorun; Soori, Saeed; Can, Bugra; Gurbuzbalaban, Mert; Dehnavi, Maryam Mehri (January 2022, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis)

This work presents a Hybrid Low-Rank Natural Gradient Descent method, called HyLo, that accelerates the training time of deep neural networks. Natural gradient descent (NGD) requires computing the inverse of the Fisher information matrix (FIM), which is typically expensive at large-scale. Kronecker factorization methods such as KFAC attempt to improve NGD's running time by approximating the FIM with Kronecker factors. However, the size of Kronecker factors increases quadratically as the model size grows. Instead, in HyLo, we use the Sherman-Morrison-Woodbury variant of NGD (SNGD) and propose a reformulation of SNGD to resolve its scalability issues. HyLo uses a computationally-efficient low-rank factorization to achieve superior timing for Fisher inverses. We evaluate HyLo on large models including ResNet-50, U-Net, and ResNet-32 on up to 64 GPUs. HyLo converges 1.4×-2.1× faster than the state-of-the-art distributed implementation of KFAC and reduces the computation and communication time up to 350× and 10.7× on ResNet-50.
more » « less
Full Text Available
Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks

Fallah, Alireza; Gurbuzbalaban, Mert; Ozdaglar, Asuman; Simsekli, Umut; Zhu Lingjiong (January 2022, Journal of machine learning research)
Jain, Prateek (Ed.)
Full Text Available
L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

https://doi.org/10.1109/CDC45484.2021.9682985

Can, Bugra; Soori, Saeed; Dehnavi, Maryam Mehri; Gurbuzbalaban, Mert (December 2021, 2021 60th IEEE Conference on Decision and Control (CDC))

Full Text Available
Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Nonconvex Stochastic Optimization: Nonasymptotic Performance Bounds and Momentum-Based Acceleration

https://doi.org/10.1287/opre.2021.2162

Gao, Xuefeng; Gürbüzbalaban, Mert; Zhu, Lingjiong (January 2021, Operations Research)

Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is a variant of stochastic gradients with momentum where a controlled and properly scaled Gaussian noise is added to the stochastic gradients to steer the iterates toward a global minimum. Many works report its empirical success in practice for solving stochastic nonconvex optimization problems; in particular, it has been observed to outperform overdamped Langevin Monte Carlo–based methods, such as stochastic gradient Langevin dynamics (SGLD), in many applications. Although the asymptotic global convergence properties of SGHMC are well known, its finite-time performance is not well understood. In this work, we study two variants of SGHMC based on two alternative discretizations of the underdamped Langevin diffusion. We provide finite-time performance bounds for the global convergence of both SGHMC variants for solving stochastic nonconvex optimization problems with explicit constants. Our results lead to nonasymptotic guarantees for both population and empirical risk minimization problems. For a fixed target accuracy level on a class of nonconvex problems, we obtain complexity bounds for SGHMC that can be tighter than those available for SGLD.
more » « less
Full Text Available

« Prev Next »

Search for: All records