While machine learning classifier models become more widely adopted, opaque “black-box” models remain mostly inscrutable for a variety of reasons. Since their applications increasingly involve decisions impacting the lives of humans, there is increasing demand that their predictions be understandable to humans. Of particular interest in eXplainable AI (XAI) is the interpretability of explanations, i.e., that a model’s prediction should be understandable in terms of the input features. One popular approach is LIME, which offers a model-agnostic framework for explaining any classifier. However, questions remain about the limitations and vulnerabilities of such post-hoc explainers. We have built a tool for generating synthetic tabular data sets which enables us to probe the explanation system opportunistically based on its architecture. In this paper, we report on our success in revealing a scenario where LIME’s explanation violates local faithfulness.
more »
« less
A Reusable Model-agnostic Framework for Faithfully Explainable Recommendation and System Scrutability
State-of-the-art industrial-level recommender system applications mostly adopt complicated model structures such as deep neural networks. While this helps with the model performance, the lack of system explainability caused by these nearly blackbox models also raises concerns and potentially weakens the users’ trust in the system. Existing work on explainable recommendation mostly focuses on designing interpretable model structures to generate model-intrinsic explanations. However, most of them have complex structures, and it is difficult to directly apply these designs onto existing recommendation applications due to the effectiveness and efficiency concerns. However, while there have been some studies on explaining recommendation models without knowing their internal structures (i.e., model-agnostic explanations), these methods have been criticized for not reflecting the actual reasoning process of the recommendation model or, in other words, faithfulness . How to develop model-agnostic explanation methods and evaluate them in terms of faithfulness is mostly unknown. In this work, we propose a reusable evaluation pipeline for model-agnostic explainable recommendation. Our pipeline evaluates the quality of model-agnostic explanation from the perspectives of faithfulness and scrutability. We further propose a model-agnostic explanation framework for recommendation and verify it with the proposed evaluation pipeline. Extensive experiments on public datasets demonstrate that our model-agnostic framework is able to generate explanations that are faithful to the recommendation model. We additionally provide quantitative and qualitative study to show that our explanation framework could enhance the scrutability of blackbox recommendation model. With proper modification, our evaluation pipeline and model-agnostic explanation framework could be easily migrated to existing applications. Through this work, we hope to encourage the community to focus more on faithfulness evaluation of explainable recommender systems.
more »
« less
- PAR ID:
- 10445534
- Date Published:
- Journal Name:
- ACM Transactions on Information Systems
- Volume:
- 42
- Issue:
- 1
- ISSN:
- 1046-8188
- Page Range / eLocation ID:
- 1 to 29
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Product retrieval systems have served as the main entry for customers to discover and purchase products online. With increasing concerns on the transparency and accountability of AI systems, studies on explainable information retrieval has received more and more attention in the research community. Interestingly, in the domain of e-commerce, despite the extensive studies on explainable product recommendation, the studies of explainable product search is still in an early stage. In this paper, we study how to construct effective explainable product search by comparing model-agnostic explanation paradigms with model-intrinsic paradigms and analyzing the important factors that determine the performance of product search explanations. We propose an explainable product search model with model-intrinsic interpretability and conduct crowdsourcing to compare it with the state-of-the-art explainable product search model with model-agnostic interpretability. We observe that both paradigms have their own advantages and the effectiveness of search explanations on different properties are affected by different factors. For example, explanation fidelity is more important for user's overall satisfaction on the system while explanation novelty may be more useful in attracting user purchases. These findings could have important implications for the future studies and design of explainable product search engines.more » « less
-
Explaining to users why some items are recommended is critical, as it can help users to make better decisions, increase their satisfaction, and gain their trust in recommender systems (RS). However, existing explainable RS usually consider explanation as a side output of the recommendation model, which has two problems: (1) It is difficult to evaluate the produced explanations, because they are usually model-dependent, and (2) as a result, how the explanations impact the recommendation performance is less investigated. In this article, explaining recommendations is formulated as a ranking task and learned from data, similarly to item ranking for recommendation. This makes it possible for standard evaluation of explanations via ranking metrics (e.g., Normalized Discounted Cumulative Gain). Furthermore, this article extends traditional item ranking to an item–explanation joint-ranking formalization to study if purposely selecting explanations could reach certain learning goals, e.g., improving recommendation performance. A great challenge, however, is that the sparsity issue in the user-item-explanation data would be inevitably severer than that in traditional user–item interaction data, since not every user–item pair can be associated with all explanations. To mitigate this issue, this article proposes to perform two sets of matrix factorization by considering the ternary relationship as two groups of binary relationships. Experiments on three large datasets verify the solution’s effectiveness on both explanation ranking and item recommendation.more » « less
-
Transparency and accountability have become major concerns for black-box machine learning (ML) models. Proper explanations for the model behavior increase model transparency and help researchers develop more accountable models. Graph neural networks (GNN) have recently shown superior performance in many graph ML problems than traditional methods, and explaining them has attracted increased interest. However, GNN explanation for link prediction (LP) is lacking in the literature. LP is an essential GNN task and corresponds to web applications like recommendation and sponsored search on web. Given existing GNN explanation methods only address node/graph-level tasks, we propose Path-based GNN Explanation for heterogeneous Link prediction (PaGE-Link) that generates explanations with connection interpretability, enjoys model scalability, and handles graph heterogeneity. Qualitatively, PaGE-Link can generate explanations as paths connecting a node pair, which naturally captures connections between the two nodes and easily transfer to human-interpretable explanations. Quantitatively, explanations generated by PaGE-Link improve AUC for recommendation on citation and user-item graphs by 9 - 35% and are chosen as better by 78.79% of responses in human evaluation.more » « less
-
As recommendation is essentially a comparative (or ranking) process, a good explanation should illustrate to users why an item is believed to be better than another, i.e., comparative explanations about the recommended items. Ideally, after reading the explanations, a user should reach the same ranking of items as the system’s. Unfortunately, little research attention has yet been paid on such comparative explanations. In this work, we develop an extract-and-refine architecture to explain the relative comparisons among a set of ranked items from a recommender system. For each recommended item, we first extract one sentence from its associated reviews that best suits the desired comparison against a set of reference items. Then this extracted sentence is further articulated with respect to the target user through a generative model to better explain why the item is recommended. We design a new explanation quality metric based on BLEU to guide the end-to-end training of the extraction and refinement components, which avoids generation of generic content. Extensive offline evaluations on two large recommendation benchmark datasets and serious user studies against an array of state-of-the-art explainable recommendation algorithms demonstrate the necessity of comparative explanations and the effectiveness of our solution.more » « less