skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Title: Regularizing Black-box Models for Improved Interpretability
Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
NIPS'20: Proceedings of the 34th International Conference on Neural Information Processing Systems
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traffic accident anticipation is a vital function of Automated Driving Systems (ADS) in providing a safety-guaranteed driving experience. An accident anticipation model aims to predict accidents promptly and accurately before they occur. Existing Artificial Intelligence (AI) models of accident anticipation lack a human-interpretable explanation of their decision making. Although these models perform well, they remain a black-box to the ADS users who find it to difficult to trust them. To this end, this paper presents a gated recurrent unit (GRU) network that learns spatio-temporal relational features for the early anticipation of traffic accidents from dashcam video data. A post-hoc attention mechanism named Grad-CAM (Gradient-weighted Class Activation Map) is integrated into the network to generate saliency maps as the visual explanation of the accident anticipation decision. An eye tracker captures human eye fixation points for generating human attention maps. The explainability of network-generated saliency maps is evaluated in comparison to human attention maps. Qualitative and quantitative results on a public crash data set confirm that the proposed explainable network can anticipate an accident on average 4.57 s before it occurs, with 94.02% average precision. Various post-hoc attention-based XAI methods are then evaluated and compared. This confirms that the Grad-CAM chosen by this study can generate high-quality, human-interpretable saliency maps (with 1.23 Normalized Scanpath Saliency) for explaining the crash anticipation decision. Importantly, results confirm that the proposed AI model, with a human-inspired design, can outperform humans in accident anticipation. 
    more » « less
  2. As machine learning black boxes are increasingly being deployed in domains such as healthcare and criminal justice, there is growing emphasis on building tools and techniques for explaining these black boxes in an interpretable manner. Such explanations are being leveraged by domain experts to diagnose systematic errors and underlying biases of black boxes. In this paper, we demonstrate that post hoc explanations techniques that rely on input perturbations, such as LIME and SHAP, are not reliable. Specifically, we propose a novel scaffolding technique that effectively hides the biases of any given classifier by allowing an adversarial entity to craft an arbitrary desired explanation. Our approach can be used to scaffold any biased classifier in such a way that its predictions on the input data distribution still remain biased, but the post hoc explanations of the scaffolded classifier look innocuous. Using extensive evaluation with multiple real world datasets (including COMPAS), we demonstrate how extremely biased (racist) classifiers crafted by our framework can easily fool popular explanation techniques such as LIME and SHAP into generating innocuous explanations which do not reflect the underlying biases. 
    more » « less
  3. Post-hoc explanations of machine learning models are crucial for people to understand and act on algorithmic predictions. An intriguing class of explanations is through counterfactuals, hypothetical examples that show people how to obtain a different prediction. We posit that effective counterfactual explanations should satisfy two properties: feasibility of the counterfactual actions given user context and constraints, and diversity among the counterfactuals presented. To this end, we propose a framework for generating and evaluating a diverse set of counterfactual explanations based on determinantal point processes. To evaluate the actionability of counterfactuals, we provide metrics that enable comparison of counterfactual-based methods to other local explanation methods. We further address necessary tradeoffs and point to causal implications in optimizing for counterfactuals. Our experiments on four real-world datasets show that our framework can generate a set of counterfactuals that are diverse and well approximate local decision boundaries, outperforming prior approaches to generating diverse counterfactuals. We provide an implementation of the framework at 
    more » « less
  4. State-of-the-art industrial-level recommender system applications mostly adopt complicated model structures such as deep neural networks. While this helps with the model performance, the lack of system explainability caused by these nearly blackbox models also raises concerns and potentially weakens the users’ trust in the system. Existing work on explainable recommendation mostly focuses on designing interpretable model structures to generate model-intrinsic explanations. However, most of them have complex structures, and it is difficult to directly apply these designs onto existing recommendation applications due to the effectiveness and efficiency concerns. However, while there have been some studies on explaining recommendation models without knowing their internal structures (i.e., model-agnostic explanations), these methods have been criticized for not reflecting the actual reasoning process of the recommendation model or, in other words, faithfulness . How to develop model-agnostic explanation methods and evaluate them in terms of faithfulness is mostly unknown. In this work, we propose a reusable evaluation pipeline for model-agnostic explainable recommendation. Our pipeline evaluates the quality of model-agnostic explanation from the perspectives of faithfulness and scrutability. We further propose a model-agnostic explanation framework for recommendation and verify it with the proposed evaluation pipeline. Extensive experiments on public datasets demonstrate that our model-agnostic framework is able to generate explanations that are faithful to the recommendation model. We additionally provide quantitative and qualitative study to show that our explanation framework could enhance the scrutability of blackbox recommendation model. With proper modification, our evaluation pipeline and model-agnostic explanation framework could be easily migrated to existing applications. Through this work, we hope to encourage the community to focus more on faithfulness evaluation of explainable recommender systems. 
    more » « less
  5. Solar flares are transient space weather events that pose a significant threat to space and ground-based technological systems, making their precise and reliable prediction crucial for mitigating potential impacts. This paper contributes to the growing body of research on deep learning methods for solar flare prediction, primarily focusing on highly overlooked near-limb flares and utilizing the attribution methods to provide a post hoc qualitative explanation of the model’s predictions. We present a solar flare prediction model, which is trained using hourly full-disk line-of-sight magnetogram images and employs a binary prediction mode to forecast ≥M-class flares that may occur within the following 24-hour period. To address the class imbalance, we employ a fusion of data augmentation and class weighting techniques; and evaluate the overall performance of our model using the true skill statistic (TSS) and Heidke skill score (HSS). Moreover, we applied three attribution methods, namely Guided Gradient-weighted Class Activation Mapping, Integrated Gradients, and Deep Shapley Additive Explanations, to interpret and cross-validate our model’s predictions with the explanations. Our analysis revealed that full-disk prediction of solar flares aligns with characteristics related to active regions (ARs). In particular, the key findings of this study are: (1) our deep learning models achieved an average TSS∼0.51 and HSS∼0.35, and the results further demonstrate a competent capability to predict near-limb solar flares and (2) the qualitative analysis of the model’s explanation indicates that our model identifies and uses features associated with ARs in central and near-limb locations from full-disk magnetograms to make corresponding predictions. In other words, our models learn the shape and texture-based characteristics of flaring ARs even when they are at near-limb areas, which is a novel and critical capability that has significant implications for operational forecasting. 
    more » « less