Benign Overfitting in Adversarial Training of Neural Networks

Wang, Yunjuan; Zhang, Kaibo; Arora, Raman

Citation Details

Benign overfitting is the phenomenon wherein none of the predictors in the hypothesis class can achieve perfect accuracy (i.e., non-realizable or noisy setting), but a model that interpolates the training data still achieves good generalization. A series of recent works aim to understand this phenomenon for regression and classification tasks using linear predictors as well as two-layer neural networks. In this paper, we study such a benign overfitting phenomenon in an adversarial setting. We show that under a distributional assumption, interpolating neural networks found using adversarial training generalize well despite inferencetime attacks. Specifically, we provide convergence and generalization guarantees for adversarial training of two-layer networks (with smooth as well as non-smooth activation functions) showing that under moderate ℓ2 norm perturbation budget, the trained model has near-zero robust training loss and near-optimal robust generalization error. We support our theoretical findings with an empirical study on synthetic and real-world data. more »

Award ID(s):: 1943251

PAR ID:: 10572969

Author(s) / Creator(s):: Wang, Yunjuan; Zhang, Kaibo; Arora, Raman

Publisher / Repository:: Proceedings of the 41st International Conference on Machine Learning, PMLR 235, 2024

Date Published:: 2024-07-01

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this