NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (February 2024, Advances in Neural Information Processing Systems)
Uniform-in-time Wasserstein stability bounds for (noisy) stochastic gradient descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (February 2024, Advances in Neural Information Processing Systems)
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent

Zhu, Lingjiong; Gurbuzbalaban, Mert; Raj, Anant; Simsekli, Umut (December 2023, Advances in neural information processing systems)

Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different classes of loss functions. While these bounds have illuminated various properties of optimization algorithms, the analysis of each case typically required a different proof technique with significantly different mathematical tools. In this study, we make a novel connection between learning theory and applied probability and introduce a unified guideline for proving Wasserstein stability bounds for stochastic optimization algorithms. We illustrate our approach on stochastic gradient descent (SGD) and we obtain time-uniform stability bounds (i.e., the bound does not increase with the number of iterations) for strongly convex losses and non-convex losses with additive noise, where we recover similar results to the prior art or extend them to more general cases by using a single proof technique. Our approach is flexible and can be generalizable to other popular optimizers, as it mainly requires developing Lyapunov functions, which are often readily available in the literature. It also illustrates that ergodicity is an important component for obtaining time uniform bounds – which might not be achieved for convex or non-convex losses unless additional noise is injected to the iterates. Finally, we slightly stretch our analysis technique and prove time-uniform bounds for SGD under convex and non-convex losses (without additional additive noise), which, to our knowledge, is novel.
more » « less
Full Text Available
Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (October 2023, Proceedings of Machine Learning Research)

Heavy-tail phenomena in stochastic gradient de- scent (SGD) have been reported in several empirical studies. Experimental evidence in previous works suggests a strong interplay between the heaviness of the tails and generalization behavior of SGD. To address this empirical phenom- ena theoretically, several works have made strong topological and statistical assumptions to link the generalization error to heavy tails. Very recently, new generalization bounds have been proven, indicating a non-monotonic relationship between the generalization error and heavy tails, which is more pertinent to the reported empirical observations. While these bounds do not require additional topological assumptions given that SGD can be modeled using a heavy-tailed stochastic differential equation (SDE), they can only apply to simple quadratic problems. In this paper, we build on this line of research and develop generalization bounds for a more general class of objective functions, which includes non-convex functions as well. Our approach is based on developing Wasserstein stability bounds for heavy- tailed SDEs and their discretizations, which we then convert to generalization bounds. Our results do not require any nontrivial assumptions; yet, they shed more light to the empirical observations, thanks to the generality of the loss functions.
more » « less
Full Text Available
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Gurbuzbalaban, Mert; Hu, Yuanhan; Simsekli, Umut; Zhu, Lingjiong (September 2023, Transactions on Machine Learning Research)
Stich, Sebastian U (Ed.)
Cyclic and randomized stepsizes are widely used in the deep learning practice and can often outperform standard stepsize choices such as constant stepsize in SGD. Despite their empirical success, not much is currently known about when and why they can theoretically improve the generalization performance. We consider a general class of Markovian stepsizes for learning, which contain i.i.d. random stepsize, cyclic stepsize as well as the constant stepsize as special cases, and motivated by the literature which shows that heaviness of the tails (measured by the so-called ``tail-index”) in the SGD iterates is correlated with generalization, we study tail-index and provide a number of theoretical results that demonstrate how the tail-index varies on the stepsize scheduling. Our results bring a new understanding of the benefits of cyclic and randomized stepsizes compared to constant stepsize in terms of the tail behavior. We illustrate our theory on linear regression experiments and show through deep learning experiments that Markovian stepsizes can achieve even a heavier tail and be a viable alternative to cyclic and i.i.d. randomized stepsize rules.
more » « less
Full Text Available
Algorithmic stability of heavy-tailed SGD with general loss functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (August 2023, International Conference on Machine Learning)
Algorithmic stability of heavy-tailed SGD with general loss functions

Raj, Anant; Zhu, Lingjiong; Gurbuzbalaban, Mert; Simsekli, Umut (August 2023, International Conference on Machine Learning)
Cyclic and randomized stepsizes invoke heavier tails in SGD than constant stepsize

Gurbuzbalaban, Mert; Hu, Yuanhan; Simsekli, Umut; Zhu, Lingjiong (August 2023, Transactions on machine learning research)

Full Text Available
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize

Gurbuzbalaban, Mert; Hu, Yuanhan; Simsekli, Umut; Zhu, Lingjiong (August 2023, Transactions on machine learning research)

Full Text Available
Algorithmic stability of heavy-tailed stochastic gradient descent on least squares

Raj, Anant; Barsbey, Melih; Gurbuzbalaban, Mert; Zhu, Lingjiong; Simsekli, Umut (January 2023, International Conference on Algorithmic Learning Theory)

Full Text Available

« Prev Next »

Search for: All records