skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

This content will become publicly available on September 28, 2024

Title: Robust Transformer Neural Network for Computer Vision Applications
The remarkable success of the Transformer model in Natural Language Processing (NLP) is increasingly capturing the attention of vision researchers in contemporary times. The Vision Transformer (ViT) model effectively models long-range dependencies while utilizing a self-attention mechanism by converting image information into meaningful representations. Moreover, the parallelism property of ViT ensures better scalability and model generalization compared to Recurrent Neural Networks (RNN). However, developing robust ViT models for high-risk vision applications, such as self-driving cars, is critical. Deterministic ViT models are susceptible to noise and adversarial attacks and incapable of yielding a level of confidence in output predictions. Quantifying the confidence (or uncertainty) level in the decision is highly important in such real-world applications. In this work, we introduce a probabilistic framework for ViT to quantify the level of uncertainty in the model's decision. We approximate the posterior distribution of network parameters using variational inference. While progressing through non-linear layers, the first-order Taylor approximation was deployed. The developed framework propagates the mean and covariance of the posterior distribution through layers of the probabilistic ViT model and quantifies uncertainty at the output predictions. Quantifying uncertainty aids in providing warning signals to real-world applications in case of noisy situations. Experimental results from extensive simulation conducted on numerous benchmark datasets (e.g., MNIST and Fashion-MNIST) for image classification tasks exhibit 1) higher accuracy of proposed probabilistic ViT under noise or adversarial attacks compared to the deterministic ViT. 2) Self-evaluation through uncertainty becomes notably pronounced as noise levels escalate. Simulations were conducted at the Texas Advanced Computing Center (TACC) on the Lonestar6 supercomputer node. With the help of this vital resource, we completed all the experiments within a reasonable period.  more » « less
Award ID(s):
2401828 2153413
Author(s) / Creator(s):
Publisher / Repository:
Texas Advanced Computing Center (TACC)
Date Published:
Journal Name:
Texas Advanced Computing Center Symposium (TACCSTER)
Medium: X
Austin, Texas
Sponsoring Org:
National Science Foundation
More Like this
  1. Synthetic aperture radar (SAR) image classification is a challenging problem due to the complex imaging mechanism as well as the random speckle noise, which affects radar image interpretation. Recently, convolutional neural networks (CNNs) have been shown to outperform previous state-of-the-art techniques in computer vision tasks owing to their ability to learn relevant features from the data. However, CNNs in particular and neural networks, in general, lack uncertainty quantification and can be easily deceived by adversarial attacks. This paper proposes Bayes-SAR Net, a Bayesian CNN that can perform robust SAR image classification while quantifying the uncertainty or confidence of the network in its decision. Bayes-SAR Net propagates the first two moments (mean and covariance) of the approximate posterior distribution of the network parameters given the data and obtains a predictive mean and covariance of the classification output. Experiments, using the benchmark datasets Flevoland and Oberpfaffenhofen, show superior performance and robustness to Gaussian noise and adversarial attacks, as compared to the SAR-Net homologue. Bayes-SAR Net achieves a test accuracy that is around 10% higher in the case of adversarial perturbation (levels > 0.05). 
    more » « less
  2. Deep neural networks (DNNs) have surpassed human-level accuracy in various learning tasks. However, unlike humans who have a natural cognitive intuition for probabilities, DNNs cannot express their uncertainty in the output decisions. This limits the deployment of DNNs in mission critical domains, such as warfighter decision-making or medical diagnosis. Bayesian inference provides a principled approach to reason about model’s uncertainty by estimating the posterior distribution of the unknown parameters. The challenge in DNNs remains the multi-layer stages of non-linearities, which make the propagation of high-dimensional distributions mathematically intractable. This paper establishes the theoretical and algorithmic foundations of uncertainty or belief propagation by developing new deep learning models named PremiUm-CNNs (Propagating Uncertainty in Convolutional Neural Networks). We introduce a tensor normal distribution as a prior over convolutional kernels and estimate the variational posterior by maximizing the evidence lower bound (ELBO). We start by deriving the first-order mean-covariance propagation framework. Later, we develop a framework based on the unscented transformation (correct at least up to the second-order) that propagates sigma points of the variational distribution through layers of a CNN. The propagated covariance of the predictive distribution captures uncertainty in the output decision. Comprehensive experiments conducted on diverse benchmark datasets demonstrate: 1) superior robustness against noise and adversarial attacks, 2) self-assessment through predictive uncertainty that increases quickly with increasing levels of noise or attacks, and 3) an ability to detect a targeted attack from ambient noise. 
    more » « less
  3. Model confidence or uncertainty is critical in autonomous systems as they directly tie to the safety and trustworthiness of the system. The quantification of uncertainty in the output decisions of deep neural networks (DNNs) is a challenging problem. The Bayesian framework enables the estimation of the predictive uncertainty by introducing probability distributions over the (unknown) network weights; however, the propagation of these high-dimensional distributions through multiple layers and non-linear transformations is mathematically intractable. In this work, we propose an extended variational inference (eVI) framework for convolutional neural network (CNN) based on tensor Normal distributions (TNDs) defined over convolutional kernels. Our proposed eVI framework propagates the first two moments (mean and covariance) of these TNDs through all layers of the CNN. We employ first-order Taylor series linearization to approximate the mean and covariances passing through the non-linear activations. The uncertainty in the output decision is given by the propagated covariance of the predictive distribution. Furthermore, we show, through extensive simulations on the MNIST and CIFAR-10 datasets, that the CNN becomes more robust to Gaussian noise and adversarial attacks. 
    more » « less
  4. Abstract

    Many real-world mission-critical applications require continual online learning from noisy data and real-time decision making with a defined confidence level. Brain-inspired probabilistic models of neural network can explicitly handle the uncertainty in data and allow adaptive learning on the fly. However, their implementation in a compact, low-power hardware remains a challenge. In this work, we introduce a novel hardware fabric that can implement a new class of stochastic neural network called Neural Sampling Machine (NSM) by exploiting the stochasticity in the synaptic connections for approximate Bayesian inference. We experimentally demonstrate an in silico hybrid stochastic synapse by pairing a ferroelectric field-effect transistor (FeFET)-based analog weight cell with a two-terminal stochastic selector element. We show that the stochastic switching characteristic of the selector between the insulator and the metallic states resembles the multiplicative synaptic noise of the NSM. We perform network-level simulations to highlight the salient features offered by the stochastic NSM such as performing autonomous weight normalization for continual online learning and Bayesian inferencing. We show that the stochastic NSM can not only perform highly accurate image classification with 98.25% accuracy on standard MNIST dataset, but also estimate the uncertainty in prediction (measured in terms of the entropy of prediction) when the digits of the MNIST dataset are rotated. Building such a probabilistic hardware platform that can support neuroscience inspired models can enhance the learning and inference capability of the current artificial intelligence (AI).

    more » « less
  5. null (Ed.)
    While deep learning continues to permeate through all fields of signal processing and machine learning, a critical exploit in these frameworks exists and remains unsolved. These exploits, or adversarial examples, are a type of signal attack that can change the output class of a classifier by perturbing the stimulus signal by an imperceptible amount. The attack takes advantage of statistical irregularities within the training data, where the added perturbations can move the image across deep learning decision boundaries. What is even more alarming is the transferability of these attacks to different deep learning models and architectures. This means a successful attack on one model has adversarial effects on other, unrelated models. In a general sense, adversarial attack through perturbations is not a machine learning vulnerability. Human and biological vision can also be fooled by various methods, i.e. mixing high and low frequency images together, by altering semantically related signals, or by sufficiently distorting the input signal. However, the amount and magnitude of such a distortion required to alter biological perception is at a much larger scale. In this work, we explored this gap through the lens of biology and neuroscience in order to understand the robustness exhibited in human perception. Our experiments show that by leveraging sparsity and modeling the biological mechanisms at a cellular level, we are able to mitigate the effect of adversarial alterations to the signal that have no perceptible meaning. Furthermore, we present and illustrate the effects of top-down functional processes that contribute to the inherent immunity in human perception in the context of exploiting these properties to make a more robust machine vision system. 
    more » « less