skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 11, 2025

Title: Visualizing and Generalizing Integrated Attributions
Explainability and attribution for deep neural networks remains an open area of study due to the importance of adequately interpreting the behavior of such ubiquitous learning models. The method of expected gradients [10] reduced the baseline dependence of integrated gradients [27] and allowed for improved interpretability of attributions as representative of the broader gradient landscape, however both methods are visualized using an ambiguous transformation which obscures attribution information and neglects to distinguish between color channels. While expected gradients takes an expectation over the entire dataset, this is only one possible domain in which an explanation can be contextualized. In order to generalize the larger family of attribution methods containing integrated gradients and expected gradients, we instead frame each attribution as a volume integral over a set of interest within the input space, allowing for new levels of specificity and revealing novel sources of attribution information. Additionally, we demonstrate these new unique sources of feature attribution information using a refined visualization method which allows for both signed and unsigned attributions to be visually salient for each color channel. This new formulation provides a framework for developing and explaining a much broader family of attribution measures, and for computing attributions relevant to diverse contexts such as local and non-local neighborhoods. We evaluate our novel family of attribution measures and our improved visualization method using qualitative and quantitative approaches with the CIFAR10 and ImageNet datasets and the Quantus XAI library.  more » « less
Award ID(s):
2134237
PAR ID:
10616452
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Nature Switzerland
Date Published:
ISBN:
978-3-031-78188-9
Page Range / eLocation ID:
455 to 470
Subject(s) / Keyword(s):
interpretability visualization integrated gradients explainability artificial intelligence
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Explainability and attribution for deep neural networks remains an open area of study due to the importance of adequately interpreting the behavior of such ubiquitous learning models. The method of expected gradients [10] reduced the baseline dependence of integrated gradients [27] and allowed for improved interpretability of attributions as representative of the broader gradient landscape, however both methods are visualized using an ambiguous transformation which obscures attribution information and neglects to distinguish between color channels. While expected gradients takes an expectation over the entire dataset, this is only one possible domain in which an explanation can be contextualized. In order to generalize the larger family of attribution methods containing integrated gradients and expected gradients, we instead frame each attribution as a volume integral over a set of interest within the input space, allowing for new levels of specificity and revealing novel sources of attribution information. Additionally, we demonstrate these new unique sources of feature attribution information using a refined visualization method which allows for both signed and unsigned attributions to be visually salient for each color channel. This new formulation provides a framework for developing and explaining a much broader family of attribution measures, and for computing attributions relevant to diverse contexts such as local and non-local neighborhoods. We evaluate our novel family of attribution measures and our improved visualization method using qualitative and quantitative approaches with the CIFAR10 and ImageNet datasets and the Quantus XAI library. 
    more » « less
  2. With the increasing interest in explainable attribution for deep neural networks, it is important to consider not only the importance of individual inputs, but also the model parameters themselves. Existing methods, such as Neuron Integrated Gradients [18] and Conductance [6], attempt model attribution by applying attribution methods, such as Integrated Gradients, to the inputs of each model parameter. While these methods seem to map attributions to individual parameters, these are actually aggregated feature attributions which completely ignore the parameter space and also suffer from the same underlying limitations of Integrated Gradients. In this work, we compute parameter attributions by leveraging the recent family of measures proposed by Generalized Integrated Attributions, by instead computing integrals over the product space of inputs and parameters. This usage of the product space allows us to now explain individual neurons from varying perspectives and interpret them with the same intuition as inputs. To the best of our knowledge, ours is the first method which actually utilizes the gradient landscape of the parameter space to explain each individual weight and bias. We confirm the utility of our parameter attributions by computing exploratory statistics for a wide variety of image classification datasets and by performing pruning analyses on a standard architecture, which demonstrate that our attribution measures are able to identify both important and unimportant neurons in a convolutional neural network. 
    more » « less
  3. With the increasing interest in explainable attribution for deep neural networks, it is important to consider not only the importance of individual inputs, but also the model parameters themselves. Existing methods, such as Neuron Integrated Gradients [18] and Conductance [6], attempt model attribution by applying attribution methods, such as Integrated Gradients, to the inputs of each model parameter. While these methods seem to map attributions to individual parameters, these are actually aggregated feature attributions which completely ignore the parameter space and also suffer from the same underlying limitations of Integrated Gradients. In this work, we compute parameter attributions by leveraging the recent family of measures proposed by Generalized Integrated Attributions, by instead computing integrals over the product space of inputs and parameters. This usage of the product space allows us to now explain individual neurons from varying perspectives and interpret them with the same intuition as inputs. To the best of our knowledge, ours is the first method which actually utilizes the gradient landscape of the parameter space to explain each individual weight and bias. We confirm the utility of our parameter attributions by computing exploratory statistics for a wide variety of image classification datasets and by performing pruning analyses on a standard architecture, which demonstrate that our attribution measures are able to identify both important and unimportant neurons in a convolutional neural network. 
    more » « less
  4. Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
    Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image’s ground- truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient- based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision bound- aries to explain a classification outcome. We show that by leveraging the key fac- tors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations{—}even on non-robust models. 
    more » « less
  5. An emerging problem in trustworthy machine learning is to train models that pro- duce robust interpretations for their predictions. We take a step towards solving this problem through the lens of axiomatic attribution of neural networks. Our theory is grounded in the recent work, Integrated Gradients (IG) [STY17], in axiomatically attributing a neural network’s output change to its input change. We propose training objectives in classic robust optimization models to achieve robust IG attributions. Our objectives give principled generalizations of previous objectives designed for robust predictions, and they naturally degenerate to classic soft-margin training for one-layer neural networks. We also generalize previous theory and prove that the objectives for different robust optimization models are closely related. Experiments demonstrate the effectiveness of our method, and also point to intriguing problems which hint at the need for better optimization techniques or better neural network architectures for robust attribution training. 
    more » « less