The introduction of deep learning and CNNs to image recognition problems has led to state-of-the-art classification accuracy. However, CNNs exacerbate the issue of algorithm explainability due to deep learning’s black box nature. Numerous explainable AI (XAI) algorithms have been developed that provide developers insight into the operations of deep learning. We aim to make XAI explanations more user-centric by introducing modifications to existing XAI algorithms based on cognitive theory. The goal of this research is to yield intuitive XAI explanations that more closely resemble explanations given by experts in the domain of bird watching. Using an existing base XAI algorithm, we conducted two user studies with expert bird watchers and found that our novel averaged and contrasting XAI algorithms are significantly preferred over the base XAI algorithm for bird identification.
more »
« less
XRand: Differentially Private Defense against Explanation-Guided Attacks
Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.
more »
« less
- Award ID(s):
- 1935928
- PAR ID:
- 10426202
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 37
- Issue:
- 10
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 11873 to 11881
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Despite AI’s significant growth, its “black box” nature creates challenges in generating adequate trust. Thus, it is seldom utilized as a standalone unit in high-risk applications. Explainable AI (XAI) has emerged to help with this problem. Designing effectively fast and accurate XAI is still challenging, especially in numerical applications. We propose a novel XAI model named Transparency Relying Upon Statistical Theory (TRUST) for XAI. TRUST XAI models the statistical behavior of the underlying AI’s outputs. Factor analysis is used to transform the input features into a new set of latent variables. We use mutual information to rank these parameters and pick only the most influential ones on the AI’s outputs and call them “representatives” of the classes. Then we use multi-model Gaussian distributions to determine the likelihood of any new sample belonging to each class. The proposed technique is a surrogate model that is not dependent on the type of the underlying AI. TRUST is suitable for any numerical application. Here, we use cybersecurity of the industrial internet of things (IIoT) as an example application. We analyze the performance of the model using three different cybersecurity datasets, including “WUSTLIIoT”, “NSL-KDD”, and “UNSW”. We also show how TRUST is explained to the user. The TRUST XAI provides explanations for new random samples with an average success rate of 98%. Also, the advantages of our model over another popular XAI model, LIME, including performance, speed, and the method of explainability are evaluated.more » « less
-
Recent research has developed a number of eXplainable AI (XAI) techniques, such as gradient-based approaches, input perturbation-base methods, and black-box explanation methods. While these XAI techniques can extract meaningful insights from deep learning models, how to properly evaluate them remains an open problem. The most widely used approach is to perturb or even remove what the XAI method considers to be the most important features in an input and observe the changes in the output prediction. This approach, although straightforward, suffers the Out-of-Distribution (OOD) problem as the perturbed samples may no longer follow the original data distribution. A recent method RemOve And Retrain (ROAR) solves the OOD issue by retraining the model with perturbed samples guided by explanations. However, using the model retrained based on XAI methods to evaluate these explainers may cause information leakage and thus lead to unfair comparisons. We propose Fine-tuned Fidelity (F-Fidelity), a robust evaluation framework for XAI, which utilizes i) an explanation-agnostic fine-tuning strategy, thus mitigating the information leakage issue, and ii) a random masking operation that ensures that the removal step does not generate an OOD input. We also design controlled experiments with state-of-the-art (SOTA) explainers and their degraded version to verify the correctness of our framework. We conduct experiments on multiple data modalities, such as images, time series, and natural language. The results demonstrate that F-Fidelity significantly improves upon prior evaluation metrics in recovering the ground-truth ranking of the explainers. Furthermore, we show both theoretically and empirically that, given a faithful explainer, the F-Fidelity metric can be used to compute the sparsity of influential input components, i.e., to extract the true explanation size.more » « less
-
Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks. These risks have motivated the field of eXplainable Artificial Intelligence (XAI) which improves trust by evaluating local interpretability in black-box neural networks. Unfortunately, the ground truth is unavailable for the model's decision, so evaluation is limited to qualitative assessment. Further, interpretability may lead to inaccurate conclusions about the model or a false sense of trust. We propose to improve XAI from the vantage point of the user's trust by exploring a black-box model's latent feature space. We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes. A user explores the manifold by perturbing the input features of a query sample and recording the response for a subset of exemplars from any class. Our approach is a locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks. We compare ProtoShotXAI to the state-of-the-art XAI approaches on MNIST, Omniglot, and ImageNet to demonstrate, both quantitatively and qualitatively, that ProtoShotXAI provides more flexibility for model exploration. Finally, ProtoShotXAI also demonstrates novel explainability and detectability on adversarial samples.more » « less
-
Diffusion models have begun to overshadow GANs and other generative models in industrial applications due to their superior image generation performance. The complex architecture of these models furnishes an extensive array of attack features. In light of this, we aim to design membership inference attacks (MIAs) catered to diffusion models. We first conduct an exhaustive analysis of existing MIAs on diffusion models, taking into account factors such as black-box/white-box models and the selection of attack features. We found that white-box attacks are highly applicable in real-world scenarios, and the most effective attacks presently are white-box. Departing from earlier research, which employs model loss as the attack feature for white-box MIAs, we employ model gradients in our attack, leveraging the fact that these gradients provide a more profound understanding of model responses to various samples. We subject these models to rigorous testing across a range of parameters, including training steps, timestep sampling frequency, diffusion steps, and data variance. Across all experimental settings, our method consistently demonstrated near-flawless attack performance, with attack success rate approaching 100% and attack AUCROC near 1.0. We also evaluated our attack against common defense mechanisms, and observed our attacks continue to exhibit commendable performance.more » « less
An official website of the United States government

