skip to main content


Title: Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization
The use of min-max optimization in the adversarial training of deep neural network classifiers, and the training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. Unfortunately, recent results have established that even approximate first-order stationary points of such objectives are intractable, even under smoothness conditions, motivating the study of min-max objectives with additional structure. We introduce a new class of structured nonconvex-nonconcave min-max optimization problems, proposing a generalization of the extragradient algorithm which provably converges to a stationary point. The algorithm applies not only to Euclidean spaces, but also to general ℓ𝑝-normed finite-dimensional real vector spaces. We also discuss its stability under stochastic oracles and provide bounds on its sample complexity. Our iteration complexity and sample complexity bounds either match or improve the best known bounds for the same or less general nonconvex-nonconcave settings, such as those that satisfy variational coherence or in which a weak solution to the associated variational inequality problem is assumed to exist.  more » « less
Award ID(s):
2007757
PAR ID:
10294964
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
Volume:
130
Page Range / eLocation ID:
2746-2754
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet–50 under ImageNet). 
    more » « less
  2. Minimax optimization has seen a surge in interest with the advent of modern applications such as GANs, and it is inherently more challenging than simple minimization. The difficulty is exacerbated by the training data residing at multiple edge devices or clients, especially when these clients can have heterogeneous datasets and heterogeneous local computation capabilities. We propose a general federated minimax optimization framework that subsumes such settings and several existing methods like Local SGDA. We show that naive aggregation of model updates made by clients running unequal number of local steps can result in optimizing a mismatched objective function – a phenomenon previously observed in standard federated minimization. To fix this problem, we propose normalizing the client updates by the number of local steps. We analyze the convergence of the proposed algorithm for classes of nonconvex-concave and nonconvex-nonconcave functions and characterize the impact of heterogeneous client data, partial client participation, and heterogeneous local computations. For all the function classes considered, we significantly improve the existing computation and communication complexity results. Experimental results support our theoretical claims. 
    more » « less
  3. We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing N(0,I) against N(αv,I), where ∥v∥2=1, with an ε-fraction of adversarial corruptions, to be Θ~(max(d−−√α2,dε3α4,min(d2/3ε2/3α8/3,dεα2))), while the complexity against adaptive adversaries is Θ~(max(d−−√α2,dε2α4)), which is strictly worse for a large range of vanishing ε,α. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity Θ~(max(d−−√/α2,dε2/α4)), and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting. 
    more » « less
  4. We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing N(0,I) against N(αv,I), where ∥v∥2=1, with an ε-fraction of adversarial corruptions, to be Θ~(max(d√α2,dε3α4,min(d2/3ε2/3α8/3,dεα2))) while the complexity against adaptive adversaries is Θ~(max(d√α2,dε2α4)) which is strictly worse for a large range of vanishing ε,α. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity Θ~(max(d−−√/α2,dε2/α4)), and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting. 
    more » « less
  5. null (Ed.)
    We leverage the connections between nonexpansive maps, monotone Lipschitz operators, and proximal mappings to obtain near-optimal (i.e., optimal up to poly-log factors in terms of iteration complexity) and parameter-free methods for solving monotone inclusion problems. These results immediately translate into near-optimal guarantees for approximating strong solutions to variational inequality problems, approximating convex-concave min-max optimization problems, and minimizing the norm of the gradient in min-max optimization problems. Our analysis is based on a novel and simple potential-based proof of convergence of Halpern iteration, a classical iteration for finding fixed points of nonexpansive maps. Additionally, we provide a series of algorithmic reductions that highlight connections between different problem classes and lead to lower bounds that certify near-optimality of the studied methods. 
    more » « less