skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: Code-Bridged Classifier (CBC): A Low or Negative Overhead Defense for Making a CNN Classifier Robust Against Adversarial Attacks
In this paper, we propose Code-Bridged Classifier (CBC), a framework for making a Convolutional Neural Network (CNNs) robust against adversarial attacks without increasing or even by decreasing the overall models' computational complexity. More specifically, we propose a stacked encoder-convolutional model, in which the input image is first encoded by the encoder module of a denoising auto-encoder, and then the resulting latent representation (without being decoded) is fed to a reduced complexity CNN for image classification. We illustrate that this network not only is more robust to adversarial examples but also has a significantly lower computational complexity when compared to the prior art defenses.  more » « less
Award ID(s):
1718538 2146726
NSF-PAR ID:
10298705
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
2020 21st International Symposium on Quality Electronic Design (ISQED)
Page Range / eLocation ID:
27 to 32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Deep convolutional neural networks have shown high efficiency in computer visions and other applications. However, with the increase in the depth of the networks, the computational complexity is growing exponentially. In this paper, we propose a novel solution to reduce the computational complexity of convolutional neural network models used for many class image classification. Our proposed technique breaks the classification task into two steps: 1) coarse-grain classification, in which the input samples are classified among a set of hyper-classes, 2) fine-grain classification, in which the final labels are predicted among those hyper-classes detected at the first step. We illustrate that our proposed classifier can reach the level of accuracy reported by the best in class classification models with less computational complexity (Flop Count) by only activating parts of the model that are needed for the image classification. 
    more » « less
  2. We investigate the role of representations and architectures for classifying 3D shapes in terms of their computational efficiency, generalization, and robustness to adversarial transformations. By varying the number of training examples and employing cross-modal transfer learning we study the role of initialization of existing deep architectures for 3D shape classification. Our analysis shows that multiview methods continue to offer the best generalization even without pretraining on large labeled image datasets, and even when trained on simplified inputs such as binary silhouettes. Furthermore, the performance of voxel-based 3D convolutional networks and point-based architectures can be improved via cross-modal transfer from image representations. Finally, we analyze the robustness of 3D shape classifiers to adversarial transformations and present a novel approach for generating adversarial perturbations of a 3D shape for multiview classifiers using a differentiable renderer. We find that point-based networks are more robust to point position perturbations while voxel-based and multiview networks are easily fooled with the addition of imperceptible noise to the input. 
    more » « less
  3. null (Ed.)
    Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years. However, video object segmentation remains a challenging task due to its high computational complexity. Most of the previous methods employ a two-stream CNN framework to handle spatial and motion features separately. In this paper, we propose an end-to-end encoder-decoder style 3D CNN to aggregate spatial and temporal information simultaneously for video object segmentation. To efficiently process video, we propose 3D separable convolution for the pyramid pooling module and decoder, which dramatically reduces the number of operations while maintaining the performance. Moreover, we also extend our framework to video action segmentation by adding an extra classifier to predict the action label for actors in videos. Extensive experiments on several video datasets demonstrate the superior performance of the proposed approach for action and object segmentation compared to the state-of-the-art. 
    more » « less
  4. null (Ed.)
    Convolutional neural networks (CNNs) have become a key asset to most of fields in AI. Despite their successful performance, CNNs suffer from a major drawback. They fail to capture the hierarchy of spatial relation among different parts of an entity. As a remedy to this problem, the idea of capsules was proposed by Hinton. In this paper, we propose the SubSpace Capsule Network (SCN) that exploits the idea of capsule networks to model possible variations in the appearance or implicitly-defined properties of an entity through a group of capsule subspaces instead of simply grouping neurons to create capsules. A capsule is created by projecting an input feature vector from a lower layer onto the capsule subspace using a learnable transformation. This transformation finds the degree of alignment of the input with the properties modeled by the capsule subspace.We show that SCN is a general capsule network that can successfully be applied to both discriminative and generative models without incurring computational overhead compared to CNN during test time. Effectiveness of SCN is evaluated through a comprehensive set of experiments on supervised image classification, semi-supervised image classification and high-resolution image generation tasks using the generative adversarial network (GAN) framework. SCN significantly improves the performance of the baseline models in all 3 tasks. 
    more » « less
  5. Convolutional codes have been widely studied and used in many systems. As the number of memory elements increases, frame error rate (FER) improves but computational complexity increases exponentially. Recently, decoders that achieve reduced average complexity through list decoding have been demonstrated when the convolutional encoder polynomials share a common factor that can be understood as a CRC or more generally an expurgating linear function (ELF). However, classical convolutional codes avoid such common factors because they result in a catastrophic encoder. This paper provides a way to access the complexity reduction possible with list decoding even when the convolutional encoder polynomials do not share a common factor. Decomposing the original code into component encoders that fully exclude some polynomials can allow an ELF to be factored from each component. Dual list decoding of the component encoders can often find the ML codeword. Including a fallback to regular Viterbi decoding yields excellent FER performance while requiring less average complexity than always performing Viterbi on the original trellis. A best effort dual list decoder that avoids Viterbi has performance similar to the ML decoder. Component encoders that have a shared polynomial allow for even greater complexity reduction. 
    more » « less