We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convexconcave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the nonsmooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods and match the optimal nonadaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings.
more »
« less
Adaptive and Universal Algorithms for Variational Inequalities with Optimal Convergence
We develop new adaptive algorithms for variational inequalities with monotone operators, which capture many problems of interest, notably convex optimization and convexconcave saddle point problems. Our algorithms automatically adapt to unknown problem parameters such as the smoothness and the norm of the operator, and the variance of the stochastic evaluation oracle. We show that our algorithms are universal and simultaneously achieve the optimal convergence rates in the nonsmooth, smooth, and stochastic settings. The convergence guarantees of our algorithms improve over existing adaptive methods and match the optimal nonadaptive algorithms. Additionally, prior works require that the optimization domain is bounded. In this work, we remove this restriction and give algorithms for unbounded domains that are adaptive and universal. Our general proof techniques can be used for many variants of the algorithm using one or two operator evaluations per iteration. The classical methods based on the ExtraGradient/MirrorProx algorithm require two operator evaluations per iteration, which is the dominant factor in the running time in many settings.
more »
« less
 NSFPAR ID:
 10353902
 Date Published:
 Journal Name:
 AAAI Conference on Artificial Intelligence
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We provide new adaptive firstorder methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearlyoptimal convergence rates for both smooth and nonsmooth functions, even when they only have access to stochastic gradients. In addition, they do not require any prior knowledge on how the objective function is parametrized, since they automatically adjust their percoordinate learning rate. These can be seen as truly accelerated Adagrad methods for constrained optimization. We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard nonaccelerated convergence rate. We also present a set of new results involving adaptive methods for unconstrained optimization and variational inequalities arising from monotone operators.more » « less

null (Ed.)We provide new adaptive firstorder methods for constrained convex optimization. Our main algorithms AdaACSA and AdaAGD+ are accelerated methods, which are universal in the sense that they achieve nearlyoptimal convergence rates for both smooth and nonsmooth functions, even when they only have access to stochastic gradients. In addition, they do not require any prior knowledge on how the objective function is parametrized, since they automatically adjust their percoordinate learning rate. These can be seen as truly accelerated Adagrad methods for constrained optimization. We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard nonaccelerated convergence rate. We also present a set of new results involving adaptive methods for unconstrained optimization and variational inequalities arising from monotone operators.more » « less

In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and nonconvex optimization with subGaussian noise. In previous works for convex optimization, either the convergence is only in expectation or the bound depends on the diameter of the domain. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations. The method can be applied to the nonconvex case. We demonstrate an $O((1+\sigma^{2}\log(1/\delta))/T+\sigma/\sqrt{T})$ convergence rate when the number of iterations $T$ is known and an $O((1+\sigma^{2}\log(T/\delta))/\sqrt{T})$ convergence rate when $T$ is unknown for SGD, where $1\delta$ is the desired success probability. These bounds improve over existing bounds in the literature. We also revisit AdaGradNorm (Ward et al., 2019) and show a new analysis to obtain a high probability bound that does not require the bounded gradient assumption made in previous works. The full version of our paper contains results for the standard percoordinate AdaGrad.more » « less

In this work, we describe a generic approach to show convergence with high probability for both stochastic convex and nonconvex optimization with subGaussian noise. In previous works for convex optimization, either the convergence is only in expectation or the bound depends on the diameter of the domain. Instead, we show high probability convergence with bounds depending on the initial distance to the optimal solution. The algorithms use step sizes analogous to the standard settings and are universal to Lipschitz functions, smooth functions, and their linear combinations. The method can be applied to the nonconvex case. We demonstrate an $O((1+\sigma^{2}\log(1/\delta))/T+\sigma/\sqrt{T})$ convergence rate when the number of iterations $T$ is known and an $O((1+\sigma^{2}\log(T/\delta))/\sqrt{T})$ convergence rate when $T$ is unknown for SGD, where $1\delta$ is the desired success probability. These bounds improve over existing bounds in the literature. We also revisit AdaGradNorm \cite{ward2019adagrad} and show a new analysis to obtain a high probability bound that does not require the bounded gradient assumption made in previous works. The full version of our paper contains results for the standard percoordinate AdaGrad.more » « less