skip to main content


This content will become publicly available on June 27, 2024

Title: XRand: Differentially Private Defense against Explanation-Guided Attacks
Recent development in the field of explainable artificial intelligence (XAI) has helped improve trust in Machine-Learning-as-a-Service (MLaaS) systems, in which an explanation is provided together with the model prediction in response to each query. However, XAI also opens a door for adversaries to gain insights into the black-box models in MLaaS, thereby making the models more vulnerable to several attacks. For example, feature-based explanations (e.g., SHAP) could expose the top important features that a black-box model focuses on. Such disclosure has been exploited to craft effective backdoor triggers against malware classifiers. To address this trade-off, we introduce a new concept of achieving local differential privacy (LDP) in the explanations, and from that we establish a defense, called XRand, against such attacks. We show that our mechanism restricts the information that the adversary can learn about the top important features, while maintaining the faithfulness of the explanations.  more » « less
Award ID(s):
1935928
NSF-PAR ID:
10426202
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
37
Issue:
10
ISSN:
2159-5399
Page Range / eLocation ID:
11873 to 11881
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Despite AI’s significant growth, its “black box” nature creates challenges in generating adequate trust. Thus, it is seldom utilized as a standalone unit in high-risk applications. Explainable AI (XAI) has emerged to help with this problem. Designing effectively fast and accurate XAI is still challenging, especially in numerical applications. We propose a novel XAI model named Transparency Relying Upon Statistical Theory (TRUST) for XAI. TRUST XAI models the statistical behavior of the underlying AI’s outputs. Factor analysis is used to transform the input features into a new set of latent variables. We use mutual information to rank these parameters and pick only the most influential ones on the AI’s outputs and call them “representatives” of the classes. Then we use multi-model Gaussian distributions to determine the likelihood of any new sample belonging to each class. The proposed technique is a surrogate model that is not dependent on the type of the underlying AI. TRUST is suitable for any numerical application. Here, we use cybersecurity of the industrial internet of things (IIoT) as an example application. We analyze the performance of the model using three different cybersecurity datasets, including “WUSTLIIoT”, “NSL-KDD”, and “UNSW”. We also show how TRUST is explained to the user. The TRUST XAI provides explanations for new random samples with an average success rate of 98%. Also, the advantages of our model over another popular XAI model, LIME, including performance, speed, and the method of explainability are evaluated. 
    more » « less
  2. While machine learning classifier models become more widely adopted, opaque “black-box” models remain mostly inscrutable for a variety of reasons. Since their applications increasingly involve decisions impacting the lives of humans, there is increasing demand that their predictions be understandable to humans. Of particular interest in eXplainable AI (XAI) is the interpretability of explanations, i.e., that a model’s prediction should be understandable in terms of the input features. One popular approach is LIME, which offers a model-agnostic framework for explaining any classifier. However, questions remain about the limitations and vulnerabilities of such post-hoc explainers. We have built a tool for generating synthetic tabular data sets which enables us to probe the explanation system opportunistically based on its architecture. In this paper, we report on our success in revealing a scenario where LIME’s explanation violates local faithfulness. 
    more » « less
  3. Abstract The purpose of this study is to identify additional clinical features for sepsis detection through the use of a novel mechanism for interpreting black-box machine learning models trained and to provide a suitable evaluation for the mechanism. We use the publicly available dataset from the 2019 PhysioNet Challenge. It has around 40,000 Intensive Care Unit (ICU) patients with 40 physiological variables. Using Long Short-Term Memory (LSTM) as the representative black-box machine learning model, we adapted the Multi-set Classifier to globally interpret the black-box model for concepts it learned about sepsis. To identify relevant features, the result is compared against: (i) features used by a computational sepsis expert, (ii) clinical features from clinical collaborators, (iii) academic features from literature, and (iv) significant features from statistical hypothesis testing. Random Forest was found to be the computational sepsis expert because it had high accuracies for solving both the detection and early detection, and a high degree of overlap with clinical and literature features. Using the proposed interpretation mechanism and the dataset, we identified 17 features that the LSTM used for sepsis classification, 11 of which overlaps with the top 20 features from the Random Forest model, 10 with academic features and 5 with clinical features. Clinical opinion suggests, 3 LSTM features have strong correlation with some clinical features that were not identified by the mechanism. We also found that age, chloride ion concentration, pH and oxygen saturation should be investigated further for connection with developing sepsis. Interpretation mechanisms can bolster the incorporation of state-of-the-art machine learning models into clinical decision support systems, and might help clinicians to address the issue of early sepsis detection. The promising results from this study warrants further investigation into creation of new and improvement of existing interpretation mechanisms for black-box models, and into clinical features that are currently not used in clinical assessment of sepsis. 
    more » « less
  4. Unexplainable black-box models create scenarios where anomalies cause deleterious responses, thus creating unacceptable risks. These risks have motivated the field of eXplainable Artificial Intelligence (XAI) which improves trust by evaluating local interpretability in black-box neural networks. Unfortunately, the ground truth is unavailable for the model's decision, so evaluation is limited to qualitative assessment. Further, interpretability may lead to inaccurate conclusions about the model or a false sense of trust. We propose to improve XAI from the vantage point of the user's trust by exploring a black-box model's latent feature space. We present an approach, ProtoShotXAI, that uses a Prototypical few-shot network to explore the contrastive manifold between nonlinear features of different classes. A user explores the manifold by perturbing the input features of a query sample and recording the response for a subset of exemplars from any class. Our approach is a locally interpretable XAI model that can be extended to, and demonstrated on, few-shot networks. We compare ProtoShotXAI to the state-of-the-art XAI approaches on MNIST, Omniglot, and ImageNet to demonstrate, both quantitatively and qualitatively, that ProtoShotXAI provides more flexibility for model exploration. Finally, ProtoShotXAI also demonstrates novel explainability and detectability on adversarial samples. 
    more » « less
  5. Multi-criteria ABC classification is a useful model for automatic inventory management and optimization. This model enables a rapid classification of inventory items into three groups, having varying managerial levels. Several methods, based on different criteria and principles, were proposed to build the ABC classes. However, existing ABC classification methods operate as black-box AI processes that only provide assignments of the items to the different ABC classes without providing further managerial explanations. The multi-criteria nature of the inventory classification problem makes the utilization and the interpretation of item classes difficult, without further information. Decision makers usually need additional information regarding important characteristics that were crucial in determining the managerial classes of the items because such information can help managers better understand the inventory groups and make inventory management decisions more transparent. To address this issue, we propose a two-phased explainable approach based on eXplainable Artificial Intelligence (XAI) capabilities. The proposed approach provides both local and global explanations of the built ABC classes at the item and class levels, respectively. Application of the proposed approach in inventory classification of a firm, specialized in retail sales, demonstrated its effectiveness in generating accurate and interpretable ABC classes. Assignments of the items to the different ABC classes were well-explained based on the item’s criteria. The results in this particular application have shown a significant impact of the sales, profit, and customer priority as criteria that had an impact on determining the item classes. 
    more » « less