skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Is neuron coverage a meaningful measure for testing deep neural networks?
Recent effort to test deep learning systems has produced an intuitive and compelling test criterion called neuron coverage (NC), which resembles the notion of traditional code coverage. NC measures the proportion of neurons activated in a neural network and it is implicitly assumed that increasing NC improves the quality of a test suite. In an attempt to automatically generate a test suite that increases NC, we design a novel diversity promoting regularizer that can be plugged into existing adversarial attack algorithms. We then assess whether such attempts to increase NC could generate a test suite that (1) detects adversarial attacks successfully, (2) produces natural inputs, and (3) is unbiased to particular class predictions. Contrary to expectation, our extensive evaluation finds that increasing NC actually makes it harder to generate an effective test suite: higher neuron coverage leads to fewer defects detected, less natural inputs, and more biased prediction preferences. Our results invoke skepticism that increasing neuron coverage may not be a meaningful objective for generating tests for deep neural networks and call for a new test generation technique that considers defect detection, naturalness, and output impartiality in tandem.  more » « less
Award ID(s):
1764077
PAR ID:
10227334
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020)
Page Range / eLocation ID:
851 to 862
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure—in particular, mapper graphs—of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness. 
    more » « less
  2. Deep models are known to be vulnerable to data adversarial attacks, and many adversarial training techniques have been developed to improve their adversarial robustness. While data adversaries attack model predictions through modifying data, little is known about their impact on the neuron activations produced by the model, which play a crucial role in determining the model’s predictions and interpretability. In this work, we aim to develop a topological understanding of adversarial training to enhance its interpretability. We analyze the topological structure—in particular, mapper graphs—of neuron activations of data samples produced by deep adversarial training. Each node of a mapper graph represents a cluster of activations, and two nodes are connected by an edge if their corresponding clusters have a nonempty intersection. We provide an interactive visualization tool that demonstrates the utility of our topological framework in exploring the activation space. We found that stronger attacks make the data samples more indistinguishable in the neuron activation space that leads to a lower accuracy. Our tool also provides a natural way to identify the vulnerable data samples that may be useful in improving model robustness. 
    more » « less
  3. Fuzz testing has been gaining ground recently with substantial efforts devoted to the area. Typically, fuzzers take a set of seed inputs and leverage random mutations to continually improve the inputs with respect to a cost, e.g. program code coverage, to discover vulnerabilities or bugs. Following this methodology, fuzzers are very good at generating unstructured inputs that achieve high coverage. However fuzzers are less effective when the inputs are structured, say they conform to an input grammar. Due to the nature of random mutations, the overwhelming abundance of inputs generated by this common fuzzing practice often adversely hinders the effectiveness and efficiency of fuzzers on grammar-aware applications. The problem of testing becomes even harder, when the goal is not only to achieve increased code coverage, but also to nd complex vulnerabilities related to other cost measures, say high resource consumption in an application. We propose Saffron an adaptive grammar-based fuzzing approach to effectively and efficiently generate inputs that expose expensive executions in programs. Saffron takes as input a user-provided grammar, which describes the input space of the program under analysis, and uses it to generate test inputs. Saffron assumes that the grammar description is approximate since precisely describing the input program space is often difficult as a program may accept unintended inputs due to e.g., errors in parsing. Yet these inputs may reveal worst-case complexity vulnerabilities. The novelty of Saffron is then twofold: (1) Given the user-provided grammar, Saffron attempts to discover whether the program accepts unexpected inputs outside of the provided grammar, and if so, it repairs the grammar via grammar mutations. The repaired grammar serves as a specification of the actual inputs accepted by the application. (2) Based on the refined grammar, it generates concrete test inputs. It starts by treating every production rule in the grammar with equal probability of being used for generating concrete inputs. It then adaptively refines the probabilities along the way by increasing the probabilities for rules that have been used to generate inputs that improve a cost, e.g., code coverage or arbitrary user-defined cost. Evaluation results show that Saffron significantly outperforms state-of-the-art baselines. 
    more » « less
  4. Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of input specifications makes it challenging to apply black-box test adequacy measures in DNN testing. The Input Distribution Coverage (IDC) framework addresses this challenge by using a variational autoencoder to learn a low dimensional latent representation of the input distribution, and then using that latent space as a coverage domain for testing. IDC applies combinatorial interaction testing on a partitioning of the latent space to measure test adequacy. Empirical evaluation demonstrates that IDC is cost-effective, capable of detecting feature diversity in test inputs, and more sensitive than prior work to test inputs generated using different DNN test generation methods. The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs. 
    more » « less
  5. Abstract- Neural networks (NNs) are increasingly often employed in safety critical systems. It is therefore necessary to ensure that these NNs are robust against malicious interference in the form of adversarial attacks, which cause an NN to misclassify inputs. Many proposed defenses against such attacks incorporate randomness in order to make it harder for an attacker to find small input modifications that result in misclassification. Stochastic computing (SC) is a type of approximate computing based on pseudo-random bit-streams that has been successfully used to implement convolutional neural networks (CNNs). Some results have previously suggested that such stochastic CNNs (SCNNs) are partially robust against adversarial attacks. In this work, we will demonstrate that SCNNs do indeed possess inherent protection against some powerful adversarial attacks. Our results show that the white-box C&W attack is up to 16x less successful compared to an equivalent binary NN, and Boundary Attack even fails to generate adversarial inputs in many cases. 
    more » « less