skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Sensitivity and Stability of Model Interpretations in NLP
Recent years have witnessed the emergence of a variety of post-hoc interpretations that aim to uncover how natural language processing (NLP) models make predictions. Despite the surge of new interpretation methods, it remains an open problem how to define and quantitatively measure the faithfulness of interpretations, i.e., to what extent interpretations reflect the reasoning process by a model. We propose two new criteria, sensitivity and stability, that provide complementary notions of faithfulness to the existed removal-based criteria. Our results show that the conclusion for how faithful interpretations are could vary substantially based on different notions. Motivated by the desiderata of sensitivity and stability, we introduce a new class of interpretation methods that adopt techniques from adversarial robustness. Empirical results show that our proposed methods are effective under the new criteria and overcome limitations of gradient-based methods on removal-based criteria. Besides text classification, we also apply interpretation methods and metrics to dependency parsing. Our results shed light on understanding the diverse set of interpretations.  more » « less
Award ID(s):
1927554
PAR ID:
10391935
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
On the Sensitivity and Stability of Model Interpretations in NLP
Page Range / eLocation ID:
2631 to 2647
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ye, Yuting; Wang Huamin (Ed.)
    We analyze a wide class of penalty energies used for contact response through the lens of a reduced frame. Applying our analysis to both spring-based and barrier-based energies, we show that we can obtain closed-form, analytic eigensystems that can be used to guarantee positive semidefiniteness in implicit solvers. Our approach is both faster than direct numerical methods, and more robust than approximate methods such as Gauss-Newton. Over the course of our analysis, we investigate physical interpretations for two separate notions of length. Finally, we showcase the stability of our analysis on challenging strand, cloth, and volume scenarios with large timesteps on the order of 1/40 s. 
    more » « less
  2. Given the widespread deployment of black box deep neural networks in computer vision applications, the interpretability aspect of these black box systems has recently gained traction. Various methods have been proposed to explain the results of such deep neural networks. However, some recent works have shown that such explanation methods are biased and do not produce consistent interpretations. Hence, rather than introducing a novel explanation method, we learn models that are encouraged to be interpretable given an explanation method. We use Grad-CAM as the explanation algorithm and encourage the network to learn consistent interpretations along with maximizing the log-likelihood of the correct class. We show that our method outperforms the baseline on the pointing game evaluation on ImageNet and MS-COCO datasets respectively. We also introduce new evaluation metrics that penalize the saliency map if it lies outside the ground truth bounding box or segmentation mask, and show that our method outperforms the baseline on these metrics as well. Moreover, our model trained with interpretation consistency generalizes to other explanation algorithms on all the evaluation metrics. 
    more » « less
  3. Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods. 
    more » « less
  4. Abstract Weighting methods are popular tools for estimating causal effects, and assessing their robustness under unobserved confounding is important in practice. Current approaches to sensitivity analyses rely on bounding a worst-case error from omitting a confounder. In this paper, we introduce a new sensitivity model called the variance-based sensitivity model, which instead bounds the distributional differences that arise in the weights from omitting a confounder. The variance-based sensitivity model can be parameterized by an R2 parameter that is both standardized and bounded. We demonstrate, both empirically and theoretically, that the variance-based sensitivity model provides improvements on the stability of the sensitivity analysis procedure over existing methods. We show that by moving away from worst-case bounds, we are able to obtain more interpretable and informative bounds. We illustrate our proposed approach on a study examining blood mercury levels using the National Health and Nutrition Examination Survey. 
    more » « less
  5. Michael, Wooldridge; Dy, Jennifer; Natarajan, Sriraam (Ed.)
    This work undertakes studies to evaluate Interpretability Methods for Time Series Deep Learning. Sensitivity analysis assesses how input changes affect the output, constituting a key component of interpretation. Among the post-hoc interpretation methods such as back-propagation, perturbation, and approximation, my work will investigate perturbation-based sensitivity Analysis methods on modern Transformer models to benchmark their performances. Specifically, my work intends to answer three research questions: 1) Do different sensitivity analysis methods yield comparable outputs and attribute importance rankings? 2) Using the same sensitivity analysis method, do different Deep Learning models impact the output of the sensitivity analysis? 3) How well do the results from sensitivity analysis methods align with the ground truth? 
    more » « less