We propose equi-explanation maps to study the variation in model logic across the input space. These global model-agnostic structures partition the hyper-space of explanation features into regions of similar model logic. Equi-explanation maps act as a concise summary of instance explanations and can provide laymen an at-a-glance understanding of the basis on which the classifier makes its decisions. We thus propose the task of generating $$\epsilon$$-equi-explanation maps, a partitioning of the input space into subspaces such that the standard deviation of explanation vectors in a subspace do not exceed $$\epsilon$$. We adapt existing local and subspace explainability techniques like LIME and MUSE to generate equi-explanation maps on two binary classification datasets using four classification models and evaluate the quality of their partitioning. We find that these techniques produce a sub-optimal number of subspaces (making the maps harder to interpret) and have a considerable run time. We then propose E-map, a new divide-and-conquer based algorithm to produce $$\epsilon$$-equi-explanation maps. E-map is able to decrease the number of subspaces (and hence increase interpretability) and running time as compared to the previous systems for a fixed value of $$\epsilon$$. Finally, given a classifier decision boundary, we try to determine what would be an optimal value for the parameter $$\epsilon$$. We believe good explanation representation methods can increase the trustworthiness and understanding of machine learning models for critical real world tasks.
more »
« less
Using Polygonal Data Clusters to Investigate LIME
While machine learning classifier models become more widely adopted, opaque “black-box” models remain mostly inscrutable for a variety of reasons. Since their applications increasingly involve decisions impacting the lives of humans, there is increasing demand that their predictions be understandable to humans. Of particular interest in eXplainable AI (XAI) is the interpretability of explanations, i.e., that a model’s prediction should be understandable in terms of the input features. One popular approach is LIME, which offers a model-agnostic framework for explaining any classifier. However, questions remain about the limitations and vulnerabilities of such post-hoc explainers. We have built a tool for generating synthetic tabular data sets which enables us to probe the explanation system opportunistically based on its architecture. In this paper, we report on our success in revealing a scenario where LIME’s explanation violates local faithfulness.
more »
« less
- Award ID(s):
- 1757945
- PAR ID:
- 10314070
- Date Published:
- Journal Name:
- International Conference on Information Society (i-Society 2021)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Reinforcement learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of their likelihood of obtaining future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable information for “why” an action is chosen. We propose a techniqueExperiential Explanationsto generate counterfactual explanations by traininginfluence predictorsalong with the RL policy. Influence predictors are models that learn how different sources of reward affect the agent in different states, thus restoring information about how the policy reflects the environment. Two human evaluation studies revealed that participants presented with Experiential Explanations were better able to correctly guess what an agent would do than those presented with other standard types of explanation. Participants also found that Experiential Explanations are more understandable, satisfying, complete, useful, and accurate. Qualitative analysis provides information on the factors of Experiential Explanations that are most useful and the desired characteristics that participants seek from the explanations.more » « less
-
As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases.more » « less
-
IEEE Open Journal of the Computer Society (Ed.)While neural networks have been achieving increasingly significant excitement in solving classification tasks such as natural language processing, their lack of interpretability becomes a great challenge for neural networks to be deployed in certain high-stakes human-centered applications. To address this issue, we propose a new approach for generating interpretable predictions by inferring a simple three-layer neural network with threshold activations, so that it can benefit from effective neural network training algorithms and at the same time, produce human-understandable explanations for the results. In particular, the hidden layer neurons in the proposed model are trained with floating point weights and binary output activations. The output neuron is also trainable as a threshold logic function that implements a disjunctive operation, forming the logical-OR of the first-level threshold logic functions. This neural network can be trained using state-of-the-art training methods to achieve high prediction accuracy. An important feature of the proposed architecture is that only a simple greedy algorithm is required to provide an explanation with the prediction that is human-understandable. In comparison with other explainable decision models, our proposed approach achieves more accurate predictions on a broad set of tabular data classification datasets.more » « less
-
Explanations can help users of Artificial Intelligent (AI) systems gain a better understanding of the reasoning behind the model’s decision, facilitate their trust in AI, and assist them in making informed decisions. Due to its numerous benefits in improving how users interact and collaborate with AI, this has stirred the AI/ML community towards developing understandable or interpretable models to a larger degree, while design researchers continue to study and research ways to present explanations of these models’ decisions in a coherent form. However, there is still the lack of intentional design effort from the HCI community around these explanation system designs. In this paper, we contribute a framework to support the design and validation of explainable AI systems; one that requires carefully thinking through design decisions at several important decision points. This framework captures key aspects of explanations ranging from target users, to the data, to the AI models in use. We also discuss how we applied our framework to design an explanation interface for trace link prediction of software artifacts.more » « less