%ANguyen, Ta%ANguyen, Thien%AEne, Alina%ANguyen, Huy%D2023%INeural Information Processing Systems
%K
%MOSTI ID: 10488805
%PMedium: X
%TImproved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed Noise
%XIn this work, we study the convergence in high probability
of clipped gradient methods when the noise distribution has heavy
tails, i.e., with bounded $p$th moments, for some $1< p \leq 2$. Prior
works in this setting follow the same recipe of using concentration
inequalities and an inductive argument with union bound to bound the
iterates across all iterations. This method results in an increase
in the failure probability by a factor of $T$, where $T$ is the
number of iterations. We instead propose a new analysis approach based
on bounding the moment generating function of a well chosen supermartingale
sequence. We improve the dependency on $T$ in the convergence guarantee
for a wide range of algorithms with clipped gradients, including stochastic
(accelerated) mirror descent for convex objectives and stochastic
gradient descent for nonconvex objectives. Our high probability bounds
achieve the optimal convergence rates and match the best currently
known in-expectation bounds. Our approach naturally allows the algorithms
to use time-varying step sizes and clipping parameters when the time
horizon is unknown, which appears difficult or even impossible using
existing techniques from prior works. Furthermore, we show that in
the case of clipped stochastic mirror descent, several problem constants,
including the initial distance to the optimum, are not required when
setting step sizes and clipping parameters.