skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adversarially Robust Learning Could Leverage Computational Hardness
Over recent years, devising classification algorithms that are robust to adversarial perturbations hasemerged as a challenging problem. In particular, deep neural nets (DNNs) seem to be susceptible tosmall imperceptible changes over test instances. However, the line of work inprovablerobustness,so far, has been focused oninformation theoreticrobustness, ruling out even theexistenceof anyadversarial examples. In this work, we study whether there is a hope to benefit fromalgorithmicnature of an attacker that searches for adversarial examples, and ask whether there isanylearning taskfor which it is possible to design classifiers that are only robust againstpolynomial-timeadversaries.Indeed, numerous cryptographic tasks (e.g. encryption of long messages) can only be secure againstcomputationally bounded adversaries, and are indeedimpossiblefor computationally unboundedattackers. Thus, it is natural to ask if the same strategy could help robust learning.We show that computational limitation of attackers can indeed be useful in robust learning bydemonstrating the possibility of a classifier for some learning task for which computational andinformation theoretic adversaries of bounded perturbations have very different power. Namely, whilecomputationally unbounded adversaries can attack successfully and find adversarial examples withsmall perturbation, polynomial time adversaries are unable to do so unless they can break standardcryptographic hardness assumptions. Our results, therefore, indicate that perhaps a similar approachto cryptography (relying on computational hardness) holds promise for achieving computationallyrobust machine learning. On the reverse directions, we also show that the existence of such learningtask in which computational robustness beats information theoretic robustness requires computationalhardness by implying (average-case) hardness o fNP.  more » « less
Award ID(s):
1804648
PAR ID:
10174887
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Algorithmic Learning Theory
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small Linf-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model’s vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of Lp-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding L1-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order Linf, L1 and L2 adversaries to achieve merely 50% accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types. 
    more » « less
  2. Why are classifiers in high dimension vulnerable to “adversarial” perturbations? We show that it is likely not due to information theoretic limitations, but rather it could be due to computational constraints. First we prove that, for a broad set of classification tasks, the mere existence of a robust classifier implies that it can be found by a possibly exponential-time algorithm with relatively few training examples. Then we give two particular classification tasks where learning a robust classifier is computationally intractable. More precisely we construct two binary classifications task in high dimensional space which are (i) information theoretically easy to learn robustly for large perturbations, (ii) efficiently learnable (nonrobustly) by a simple linear separator, (iii) yet are not efficiently robustly learnable, even for small perturbations. Specifically, for the first task hardness holds for any efficient algorithm in the statistical query (SQ) model, while for the second task we rule out any efficient algorithm under a cryptographic assumption. These examples give an exponential separation between classical learning and robust learning in the statistical query model or under a cryptographic assumption. It suggests that adversarial examples may be an unavoidable byproduct of computational limitations of learning algorithms. 
    more » « less
  3. Machine learning models are vulnerable to both security attacks (e.g., adversarial examples) and privacy attacks (e.g., private attribute inference). We take the first step to mitigate both the security and privacy attacks, and maintain task utility as well. Particularly, we propose an information-theoretic framework to achieve the goals through the lens of representation learning, i.e., learning representations that are robust to both adversarial examples and attribute inference adversaries. We also derive novel theoretical results under our framework, e.g., the inherent trade-off between adversarial robustness/utility and attribute privacy, and guaranteed attribute privacy leakage against attribute inference adversaries. 
    more » « less
  4. While deep learning models have achieved unprecedented success in various domains, there is also a growing concern of adversarial attacks against related applications. Recent results show that by adding a small amount of perturbations to an image (imperceptible to humans), the resulting adversarial examples can force a classifier to make targeted mistakes. So far, most existing works focus on crafting adversarial examples in the digital domain, while limited efforts have been devoted to understanding the physical domain attacks. In this work, we explore the feasibility of generating robust adversarial examples that remain effective in the physical domain. Our core idea is to use an image-to-image translation network to simulate the digital-to-physical transformation process for generating robust adversarial examples. To validate our method, we conduct a large-scale physical-domain experiment, which involves manually taking more than 3000 physical domain photos. The results show that our method outperforms existing ones by a large margin and demonstrates a high level of robustness and transferability. 
    more » « less
  5. Several recent works have developed methods for training classifiers that are certifiably robust against norm-bounded adversarial perturbations. However, these methods assume that all the adversarial transformations provide equal value for adversaries, which is seldom the case in real-world applications. We advocate for cost-sensitive robustness as the criteria for measuring the classifier's performance for specific tasks. We encode the potential harm of different adversarial transformations in a cost matrix, and propose a general objective function to adapt the robust training method of Wong & Kolter (2018) to optimize for cost-sensitive robustness. Our experiments on simple MNIST and CIFAR10 models and a variety of cost matrices show that the proposed approach can produce models with substantially reduced cost-sensitive robust error, while maintaining classification accuracy. 
    more » « less