From optimal transport to robust dimensionality reduction, a plethora of machine learning applications
can be cast into the minmax optimization problems over Riemannian manifolds. Though
many minmax algorithms have been analyzed in the Euclidean setting, it has proved elusive to
translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic
convex concave Riemannian problems always admit saddlepoint solutions. Inspired by this result, we
study whether a performance gap between Riemannian and optimal Euclidean space convexconcave
algorithms is necessary. We answer this question in the negative—we prove that the Riemannian
corrected extragradient (RCEG) method achieves lastiterate convergence at a linear rate in the
geodesically stronglyconvexconcave case, matching the Euclidean result. Our results also extend
to the stochastic or nonsmooth case where RCEG and Riemanian gradient ascent descent (RGDA)
achieve nearoptimal convergence rates up to factors depending on curvature of the manifold.
more »
« less
On the Initialization for ConvexConcave Minmax Problems
Convexconcave minmax problems are ubiquitous in machine learning, and people usually utilize firstorder methods (e.g., gradient descent ascent) to find the optimal solution. One feature which separates convexconcave minmax problems from convex minimization problems is that the best known convergence rates for minmax problems have an explicit dependence on the size of the domain, rather than on the distance between initial point and the optimal solution. This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization. Here, we show that strictconvexitystrictconcavity is sufficient to get the convergence rate to depend on the initialization. We also show how different algorithms can asymptotically achieve initializationdependent convergence rates on this class of functions. Furthermore, we show that the socalled “parameterfree” algorithms allow to achieve improved initializationdependent asymptotic rates without any learning rate to tune. In addition, we utilize this particular parameterfree algorithm as a subroutine to design a new algorithm, which achieves a novel nonasymptotic fast rate for strictlyconvexstrictlyconcave minmax problems with a growth condition and Hölder continuous solution mapping. Experiments are conducted to verify our theoretical findings and demonstrate the effectiveness of the proposed algorithms.
more »
« less
 NSFPAR ID:
 10389017
 Editor(s):
 Dasgupta, Sanjoy; Haghtalab, Nika
 Date Published:
 Journal Name:
 International Conference on Algorithmic Learning Theory
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We propose and analyze a new stochastic gradient method, which we call Stochastic Unbiased Curvatureaided Gradient (SUCAG), for finite sum optimization problems. SUCAG constitutes an unbiased total gradient tracking technique that uses Hessian information to accelerate convergence. We analyze our method under the general asynchronous model of computation, in which each function is selected infinitely often with possibly unbounded (but sublinear) delay. For strongly convex problems, we establish linear convergence for the SUCAG method. When the initialization point is sufficiently close to the optimal solution, the established convergence rate is only dependent on the condition number of the problem, making it strictly faster than the known rate for the SAGA method. Furthermore, we describe a Markovdriven approach of implementing the SUCAG method in a distributed asynchronous multiagent setting, via gossiping along a random walk on an undirected communication graph. We show that our analysis applies as long as the graph is connected and, notably, establishes an asymptotic linear convergence rate that is robust to the graph topology. Numerical results demonstrate the merits of our algorithm over existing methods.more » « less

Minmax optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose the random reshufflingbased gradientfree Optimistic Gradient DescentAscent algorithm for solving convexconcave minmax problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zerothorder algorithms for convex minimization problems. We deploy the algorithm to solve the distributionally robust strategic classification problem, where gradient information is not readily available, by reformulating the latter into a finite dimensional convex concave minmax problem. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.more » « less

null (Ed.)Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al \cite{DISZ17} and followup work of Liang and Stokes \cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al \cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU is locally (asymptotically) stable converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.more » « less

Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al~\cite{DISZ17} and followup work of Liang and Stokes~\cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al~\cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU becomes a contracting map converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.more » « less