NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust Models Are More Interpretable Because Attributions Look Normal

Wang, Zifan; Fredrikson, Matt; Datta, Anupam (July 2022, Proceedings of Machine Learning Research)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image’s ground- truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient- based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision bound- aries to explain a classification outcome. We show that by leveraging the key fac- tors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations{—}even on non-robust models.
more » « less
Full Text Available
Fairness Under Feature Exemptions: Counterfactual and Observational Measures

https://doi.org/10.1109/TIT.2021.3103206

Dutta, Sanghamitra; Venkatesh, Praveen; Mardziel, Piotr; Datta, Anupam; Grover, Pulkit (October 2021, IEEE Transactions on Information Theory)

Full Text Available
Selective Ensembles for Consistent Predictions

Emily Black, Klas Leino (May 2021, Ninth International Conference on Learning Representations)

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis. Counterfactuals are often discussed under the assumption that the model on which they will be used is static, but in deployment models may be periodically retrained or fine-tuned. This paper studies the consistency of model prediction on counterfactual examples in deep networks under small changes to initial training conditions, such as weight initialization and leave-one-out variations in data, as often occurs during model deployment. We demonstrate experimentally that counterfactual examples for deep models are often inconsistent across such small changes, and that increasing the cost of the counterfactual, a stability-enhancing mitigation suggested by prior work in the context of simpler models, is not a reliable heuristic in deep networks. Rather, our analysis shows that a model's local Lipschitz continuity around the counterfactual is key to its consistency across related models. To this end, we propose Stable Neighbor Search as a way to generate more consistent counterfactual explanations, and illustrate the effectiveness of this approach on several benchmark datasets.
more » « less
Full Text Available
Consistent Counterfactuals for Deep Models

Emily Black; Zifan Wang; Matt Fredrikson; Anupam Datta (May 2021, Ninth International Conference on Learning Representations)

Counterfactual examples are one of the most commonly-cited methods for explaining the predictions of machine learning models in key areas such as finance and medical diagnosis. Counterfactuals are often discussed under the assumption that the model on which they will be used is static, but in deployment models may be periodically retrained or fine-tuned. This paper studies the consistency of model prediction on counterfactual examples in deep networks under small changes to initial training conditions, such as weight initialization and leave-one-out variations in data, as often occurs during model deployment. We demonstrate experimentally that counterfactual examples for deep models are often inconsistent across such small changes, and that increasing the cost of the counterfactual, a stability-enhancing mitigation suggested by prior work in the context of simpler models, is not a reliable heuristic in deep networks. Rather, our analysis shows that a model's local Lipschitz continuity around the counterfactual is key to its consistency across related models. To this end, we propose Stable Neighbor Search as a way to generate more consistent counterfactual explanations, and illustrate the effectiveness of this approach on several benchmark datasets.
more » « less
Full Text Available
Leave-one-out Unfairness

https://doi.org/10.1145/3442188.3445894

Black, Emily; Fredrikson, Matt (February 2021, 2021 ACM Conference on Fairness, Accountability, and Transparency)
null (Ed.)
Full Text Available
Influence Patterns for Explaining Information Flow in BERT

Liu, Kaiji; Wang, Zifan; Mardziel, Piotr (January 2021, Advances in neural information processing systems)

Full Text Available
Smoothed Geometry for Robust Attribution

Wang, Zifan; Wang, Haofan; Ramkumar, Shakul; Mardziel, Piotr; Fredrikson, Matt; Datta, Anupam (December 2020, Advances in Neural Information Processing Systems)
null (Ed.)
Full Text Available
Stolen Memories: Leveraging Model Memorization for Calibrated White-Box Membership Inference

Leino, Klas; Fredrikson, Matt (August 2020, 29th USENIX Security Symposium)
null (Ed.)
Full Text Available
Learning Fair Representations for Kernel Models

Tan, Zilong; Yeom, Samuel; Fredrikson, Matt; Talwalkar, Ameet (August 2020, Twenty Third International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Full Text Available
Individual Fairness Revisited: Transferring Techniques from Adversarial Robustness

https://doi.org/10.24963/ijcai.2020/61

Yeom, Samuel; Fredrikson, Matt (July 2020, Twenty-Ninth International Joint Conference on Artificial Intelligence)
null (Ed.)
We turn the definition of individual fairness on its head - rather than ascertaining the fairness of a model given a predetermined metric, we find a metric for a given model that satisfies individual fairness. This can facilitate the discussion on the fairness of a model, addressing the issue that it may be difficult to specify a priori a suitable metric. Our contributions are twofold:First, we introduce the definition of a minimal metric and characterize the behavior of models in terms of minimal metrics. Second, for more complicated models, we apply the mechanism of randomized smoothing from adversarial robustness to make them individually fair under a given weighted Lp metric. Our experiments show that adapting the minimal metrics of linear models to more complicated neural networks can lead to meaningful and interpretable fairness guarantees at little cost to utility.
more » « less
Full Text Available

« Prev Next »

Search for: All records