NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Linearly Constrained Bilevel Optimization: A Smoothed Implicit Gradient Approach

Prashant Khanduri * 1 Ioannis Tsaknakis * 2 Yihua Zhang 3 Jia Liu 4 Sijia Liu 3 Jiawei Zhang 5 Mingyi Hong (August 2023, International Conference on Machine Learning)

This work develops analysis and algorithms for solving a class of bilevel optimization problems where the lower-level (LL) problems have linear constraints. Most of the existing approaches for constrained bilevel problems rely on value function-based approximate reformulations, which suffer from issues such as non-convex and non-differentiable constraints. In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications. We first provide an in-depth understanding of the problem, by showing that the implicit objective for such problems is in general non-differentiable. However, if we add some small (linear) perturbation to the LL objective, the resulting implicit objective becomes differentiable almost surely. This key observation opens the door for developing (deterministic and stochastic) gradient-based algorithms similar to the state-of-the-art ones for unconstrained bi-level problems. We show that when the implicit function is assumed to be stronglyconvex, convex, and weakly-convex, the resulting algorithms converge with guaranteed rate. Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems.
more » « less
Revisiting and Advancing Fast Adversarial Training Through the Lens of Bi-Level Optimization

Yihua Zhang; Guanhua Zhang; Prashant Khanduri; Mingyi Hong; Shiyu Chang; Sijia Liu (July 2022, international conference on machine learning)

Adversarial training (AT) is a widely recognized defense mechanism to gain the robustness of deep neural networks against adversarial attacks. It is built on min-max optimization (MMO), where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the conventional MMO method makes AT hard to scale. Thus, FAST-AT (Wong et al., 2020) and other recent algorithms attempt to simplify MMO by replacing its maximization step with the single gradient sign-based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its empirical performance is unsatisfactory due to the issue of robust catastrophic overfitting when training with strong adversaries. In this paper, we advance FAST-AT from the fresh perspective of bi-level optimization (BLO). We first show that the commonly used FAST-AT is equivalent to using a stochastic gradient algorithm to solve a linearized BLO problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Inspired by BLO, we design and analyze a new set of robust training algorithms termed Fast Bilevel AT (FAST-BAT), which effectively defends sign-based projected gradient descent (PGD) attacks without using any gradient sign method or explicit robust regularization. In practice, we show our method yields substantial robustness improvements over baselines across multiple models and datasets
more » « less
Full Text Available
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale

Gaoyuan Zhang; Songtao Lu; Sijia Liu; Xiangyi Chen; Pin-Yu Chen; Lee Martie; Lior Horesh; Mingyi Hong (July 2022, Uncertainty in artificial intelligence)

Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e.g., on ResNet–50 under ImageNet).
more » « less
Full Text Available
How Does Unlabeled Data Improve Generalization in Self-training? A one-hidden-layer Theoretical Analysis

Zhang, Shuai; Wang, Meng; Liu, Sijia Liu; Chen, Pin-Yu Chen; Xiong, Jinjun (January 2022, the Tenth International Conference on Learning Representations (ICLR))

Full Text Available
Min-Max Optimization without Gradients: Convergence and Applications to Black-Box Evasion and Poisoning Attacks

Sijia Liu, Songtao Lu (July 2020, international conference on machine learning)
null (Ed.)
Full Text Available
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks

Zhang, Shuai; Wang, Meng; Liu, Sijia Liu; Chen, Pin-Yu; Xiong, Jinjun Xiong. (January 2021, the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS))

Full Text Available

Search for: All records