An Information Geometric Perspective to Adversarial Attacks and Defenses

Naddeo, Kyle; Bouaynaya, Nidhal; Shterenberg, Roman

doi:10.1109/IJCNN55064.2022.9892170

Deep learning models have achieved state-of-the-art accuracy in complex tasks, sometimes outperforming human-level accuracy. Yet, they suffer from vulnerabilities known as adversarial attacks, which are imperceptible input perturbations that fool the models on inputs that were originally classified correctly. The adversarial problem remains poorly understood and commonly thought to be an inherent weakness of deep learning models. We argue that understanding and alleviating the adversarial phenomenon may require us to go beyond the Euclidean view and consider the relationship between the input and output spaces as a statistical manifold with the Fisher Information as its Riemannian metric. Under this information geometric view, the optimal attack is constructed as the direction corresponding to the highest eigenvalue of the Fisher Information Matrix - called the Fisher spectral attack. We show that an orthogonal transformation of the data cleverly alters its manifold by keeping the highest eigenvalue but changing the optimal direction of attack; thus deceiving the attacker into adopting the wrong direction. We demonstrate the defensive capabilities of the proposed orthogonal scheme - against the Fisher spectral attack and the popular fast gradient sign method - on standard networks, e.g., LeNet and MobileNetV2 for benchmark data sets, MNIST and CIFAR-10.

More Like this