Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al~\cite{DISZ17} and followup work of Liang and Stokes~\cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al~\cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU becomes a contracting map converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.

Motivated by applications in Game Theory, Optimization, and Generative Adversarial Networks, recent work of Daskalakis et al \cite{DISZ17} and followup work of Liang and Stokes \cite{LiangS18} have established that a variant of the widely used Gradient Descent/Ascent procedure, called "Optimistic Gradient Descent/Ascent (OGDA)", exhibits lastiterate convergence to saddle points in {\em unconstrained} convexconcave minmax optimization problems. We show that the same holds true in the more general problem of {\em constrained} minmax optimization under a variant of the noregret MultiplicativeWeightsUpdate method called "Optimistic MultiplicativeWeights Update (OMWU)". This answers an open question of Syrgkanis et al \cite{SALS15}. The proof of our result requires fundamentally different techniques from those that exist in noregret learning literature and the aforementioned papers. We show that OMWU monotonically improves the KullbackLeibler divergence of the current iterate to the (appropriately normalized) minmax solution until it enters a neighborhood of the solution. Inside that neighborhood we show that OMWU is locally (asymptotically) stable converging to the exact solution. We believe that our techniques will be useful in the analysis of the last iterate of other learning algorithms.

The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the response variables are independent is, however, too strong. In many applications, these responses are collected on nodes of a network, or some spatial or temporal domain, and are dependent. Examples abound in financial and meteorological applications, and dependencies naturally arise in social networks through peer effects. Regression with dependent responses has thus received a lot of attention in the Statistics and Economics literature, but there are no strong consistency results unless multiple independent samples of the vectors of dependent responses can be collected from these models. We present computationally and statistically efficient methods for linear and logistic regression models when the response variables are dependent on a network. Given one sample from a networked linear or logistic regression model and under mild assumptions, we prove strong consistency results for recovering the vector of coefficients and the strength of the dependencies, recovering the rates of standard regression under independent observations. We use projected gradient descent on the negative loglikelihood, or negative logpseudolikelihood, and establish their strong convexity and consistency using concentrationmore »

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in minmax problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local minmax solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of \{OGDA\}stable critical points is a superset of \{GDA\}stable critical points, which is a superset of local minmax solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.

Motivated by applications in Optimization, Game Theory, and the training of Generative Adversarial Networks, the convergence properties of first order methods in minmax problems have received extensive study. It has been recognized that they may cycle, and there is no good understanding of their limit points when they do not. When they converge, do they converge to local minmax solutions? We characterize the limit points of two basic first order methods, namely Gradient Descent/Ascent (GDA) and Optimistic Gradient Descent Ascent (OGDA). We show that both dynamics avoid unstable critical points for almost all initializations. Moreover, for small step sizes and under mild assumptions, the set of \{OGDA\}stable critical points is a superset of \{GDA\}stable critical points, which is a superset of local minmax solutions (strict in some cases). The connecting thread is that the behavior of these dynamics can be studied from a dynamical systems perspective.