skip to main content


Title: Towards Understanding the Behaviors of Pretrained Compressed Convolutional Models
We investigate the behaviors that compressed convolutional models exhibit for two key areas within AI trust: (i) the ability for a model to be explained and (ii) its ability to be robust to adversarial attacks. While compression is known to shrink model size and decrease inference time, other properties of compression are not as well studied. We employ several compression methods on benchmark datasets, including ImageNet, to study how compression affects the convolutional aspects of an image model. We investigate explainability by studying how well compressed convolutional models can extract visual features with t-SNE, as well as visualizing localization ability of our models with class activation maps. We show that even with significantly compressed models, vital explainability is preserved and even enhanced. We find with applying the Carlini & Wagner attack algorithm on our compressed models, robustness is maintained and some forms of compression make attack more difficult or time-consuming.  more » « less
Award ID(s):
2223507
NSF-PAR ID:
10395322
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 26th International Conference on Pattern Recognition (ICPR)
Page Range / eLocation ID:
3450 to 3456
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The steganographic field is nowadays dominated by heuristic approaches for data hiding. While there exist a few model-based steganographic algorithms designed to minimize statistical detectability of the underlying model, many more algorithms based on costs of changing a specific pixel or a DCT coefficient have been over the last decade introduced. These costs are purely heuristic, as they are designed with feedback from detectors implemented as machine learning classifiers. For this reason, there is no apparent relation to statistical detectability, even though in practice they provide comparable security to model-based algorithms. Clearly, the security of such algorithms stands only on the assumption, that the detector used to assess the security, is the best one possible. Such assumption is of course completely unrealistic. Similarly, steganalysis is mainly implemented with empirical machine learning detectors, which use hand-crafted features computed from images or as deep learning detectors - convolutional neural networks. The biggest drawback of this approach is, that the steganalyst, even though having a very good detection power, has very little to no knowledge about what part of the image or the embedding algorithm contributes to the detection, because the detector is used as a black box. In this work, we will try to leave the heuristics behind and go towards statistical models. First, we introduce statistical models for current heuristic algorithms, which helps us understand and predict their security trends. Furthemore this allows us to improve the security of such algorithms. Next, we focus on steganalysis exploiting universal properties of JPEG images. Under certain realistic conditions, this leads to a very powerful attack against any steganography, because embedding even a very small secret message breaks the statistical model. Lastly, we show how we can improve security of JPEG compressed images through additional compression. 
    more » « less
  2. null (Ed.)
    Stay at home order during the COVID-19 helps flatten the curve but ironically, instigate mental health problems among the people who have Substance Use Disorders. Measuring the electrical activity signals in brain using off-the-shelf consumer wearable devices such as smart wristwatch and mapping them in real time to underlying mood, behavioral and emotional changes play striking roles in postulating mental health anomalies. In this work, we propose to implement a wearable, On-device Mental Anomaly Detection (OMAD) system to detect anomalous behaviors and activities that render to mental health problems and help clinicians to design effective intervention strategies. We propose an intrinsic artifact removal model on Electroencephalogram (EEG) signal to better correlate the fine-grained behavioral changes. We design model compression technique on the artifact removal and activity recognition (main) modules. We implement a magnitude-based weight pruning technique both on convolutional neural network and Multilayer Perceptron to employ the inference phase on Nvidia Jetson Nano; one of the tightest resource-constrained devices for wearables. We experimented with three different combinations of feature extractions and artifact removal approaches. We evaluate the performance of OMAD in terms of accuracy, F1 score, memory usage and running time for both unpruned and compressed models using EEG data from both control and treatment (alcoholic) groups for different object recognition tasks. Our artifact removal model and main activity detection model achieved about ≈ 93% and 90% accuracy, respectively with significant reduction in model size (70%) and inference time (31%). 
    more » « less
  3. Abstract

    There has been significant work recently in developing machine learning (ML) models in high energy physics (HEP) for tasks such as classification, simulation, and anomaly detection. Often these models are adapted from those designed for datasets in computer vision or natural language processing, which lack inductive biases suited to HEP data, such as equivariance to its inherent symmetries. Such biases have been shown to make models more performant and interpretable, and reduce the amount of training data needed. To that end, we develop the Lorentz group autoencoder (LGAE), an autoencoder model equivariant with respect to the proper, orthochronous Lorentz group$$\textrm{SO}^+(3,1)$$SO+(3,1), with a latent space living in the representations of the group. We present our architecture and several experimental results on jets at the LHC and find it outperforms graph and convolutional neural network baseline models on several compression, reconstruction, and anomaly detection metrics. We also demonstrate the advantage of such an equivariant model in analyzing the latent space of the autoencoder, which can improve the explainability of potential anomalies discovered by such ML models.

     
    more » « less
  4. Network pruning is a widely used technique to reduce computation cost and model size for deep neural networks. However, the typical three-stage pipeline (i.e., training, pruning, and retraining (fine-tuning)) significantly increases the overall training time. In this article, we develop a systematic weight-pruning optimization approach based on surrogate Lagrangian relaxation (SLR), which is tailored to overcome difficulties caused by the discrete nature of the weight-pruning problem. We further prove that our method ensures fast convergence of the model compression problem, and the convergence of the SLR is accelerated by using quadratic penalties. Model parameters obtained by SLR during the training phase are much closer to their optimal values as compared to those obtained by other state-of-the-art methods. We evaluate our method on image classification tasks using CIFAR-10 and ImageNet with state-of-the-art multi-layer perceptron based networks such as MLP-Mixer; attention-based networks such as Swin Transformer; and convolutional neural network based models such as VGG-16, ResNet-18, ResNet-50, ResNet-110, and MobileNetV2. We also evaluate object detection and segmentation tasks on COCO, the KITTI benchmark, and the TuSimple lane detection dataset using a variety of models. Experimental results demonstrate that our SLR-based weight-pruning optimization approach achieves a higher compression rate than state-of-the-art methods under the same accuracy requirement and also can achieve higher accuracy under the same compression rate requirement. Under classification tasks, our SLR approach converges to the desired accuracy × faster on both of the datasets. Under object detection and segmentation tasks, SLR also converges 2× faster to the desired accuracy. Further, our SLR achieves high model accuracy even at the hardpruning stage without retraining, which reduces the traditional three-stage pruning into a two-stage process. Given a limited budget of retraining epochs, our approach quickly recovers the model’s accuracy.

     
    more » « less
  5. SUMMARY

    It is well known that the axial dipole part of Earth’s magnetic field reverses polarity, so that the magnetic North Pole becomes the South Pole and vice versa. The timing of reversals is well documented for the past 160 Myr, but the conditions that lead to a reversal are still not well understood. It is not known if there are reliable ‘precursors’ of reversals (events that indicate that a reversal is upcoming) or what they might be. We investigate if machine learning (ML) techniques can reliably identify precursors of reversals based on time-series of the axial magnetic dipole field. The basic idea is to train a classifier using segments of time-series of the axial magnetic dipole. This training step requires modification of standard ML techniques to account for the fact that we are interested in rare events—a reversal is unusual, while a non-reversing field is the norm. Without our tweak, the ML classifiers lead to useless predictions. Perhaps even more importantly, the usable observational record is limited to 0–2 Ma and contains only five reversals, necessitating that we determine if the data are even sufficient to reliably train and validate an ML algorithm. To answer these questions we use several ML classifiers (linear/non-linear support vector machines and long short-term memory networks), invoke a hierarchy of numerical models (from simplified models to 3-D geodynamo simulations), and two palaeomagnetic reconstructions (PADM2M and Sint-2000). The performance of the ML classifiers varies across the models and the observational record and we provide evidence that this is not an artefact of the numerics, but rather reflects how ‘predictable’ a model or observational record is. Studying models of Earth’s magnetic field via ML classifiers thus can help with identifying shortcomings or advantages of the various models. For Earth’s magnetic field, we conclude that the ability of ML to identify precursors of reversals is limited, largely due to the small amount and low frequency resolution of data, which makes training and subsequent validation nearly impossible. Put simply: the ML techniques we tried are not currently capable of reliably identifying an axial dipole moment (ADM) precursor for geomagnetic reversals. This does not necessarily imply that such a precursor does not exist, and improvements in temporal resolution and length of ADM records may well offer better prospects in the future.

     
    more » « less