We propose and analyze a new stochastic gradient method, which we call Stochastic Unbiased Curvatureaided Gradient (SUCAG), for finite sum optimization problems. SUCAG constitutes an unbiased total gradient tracking technique that uses Hessian information to accelerate convergence. We analyze our method under the general asynchronous model of computation, in which each function is selected infinitely often with possibly unbounded (but sublinear) delay. For strongly convex problems, we establish linear convergence for the SUCAG method. When the initialization point is sufficiently close to the optimal solution, the established convergence rate is only dependent on the condition number of the problem, making it strictly faster than the known rate for the SAGA method. Furthermore, we describe a Markovdriven approach of implementing the SUCAG method in a distributed asynchronous multiagent setting, via gossiping along a random walk on an undirected communication graph. We show that our analysis applies as long as the graph is connected and, notably, establishes an asymptotic linear convergence rate that is robust to the graph topology. Numerical results demonstrate the merits of our algorithm over existing methods.
On the Initialization for ConvexConcave Minmax Problems
Convexconcave minmax problems are ubiquitous in machine learning, and people usually utilize firstorder methods (e.g., gradient descent ascent) to find the optimal solution. One feature which separates convexconcave minmax problems from convex minimization problems is that the best known convergence rates for minmax problems have an explicit dependence on the size of the domain, rather than on the distance between initial point and the optimal solution. This means that the convergence speed does not have any improvement even if the algorithm starts from the optimal solution, and hence, is oblivious to the initialization. Here, we show that strictconvexitystrictconcavity is sufficient to get the convergence rate to depend on the initialization. We also show how different algorithms can asymptotically achieve initializationdependent convergence rates on this class of functions. Furthermore, we show that the socalled “parameterfree” algorithms allow to achieve improved initializationdependent asymptotic rates without any learning rate to tune. In addition, we utilize this particular parameterfree algorithm as a subroutine to design a new algorithm, which achieves a novel nonasymptotic fast rate for strictlyconvexstrictlyconcave minmax problems with a growth condition and Hölder continuous solution mapping. Experiments are conducted to verify our theoretical findings and demonstrate the effectiveness of the proposed algorithms.
 Editors:
 Dasgupta, Sanjoy; Haghtalab, Nika
 Publication Date:
 NSFPAR ID:
 10389017
 Journal Name:
 International Conference on Algorithmic Learning Theory
 Sponsoring Org:
 National Science Foundation
More Like this


Minmax optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose the random reshufflingbased gradientfree Optimistic Gradient DescentAscent algorithm for solving convexconcave minmax problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zerothorder algorithms for convex minimization problems. We deploy the algorithm to solve the distributionally robust strategic classification problem, where gradient information is not readily available, by reformulating the latter into a finite dimensional convex concave minmax problem. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.

Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al~\cite{DISZ17} and followup work of Liang and Stokes~\cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al~\cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU becomes a contracting map converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.

Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al \cite{DISZ17} and followup work of Liang and Stokes \cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al \cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU is locally (asymptotically) stable converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.

A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve nonconvex concave minimax optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in nonconvex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying nonconvex concave minmax problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings.more »