NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robust Models Are More Interpretable Because Attributions Look Normal

Wang, Zifan; Fredrikson, Matt; Datta, Anupam (July 2022, Proceedings of Machine Learning Research)
Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image’s ground- truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient- based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision bound- aries to explain a classification outcome. We show that by leveraging the key fac- tors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations{—}even on non-robust models.
more » « less
Full Text Available
Fairness Under Feature Exemptions: Counterfactual and Observational Measures

https://doi.org/10.1109/TIT.2021.3103206

Dutta, Sanghamitra; Venkatesh, Praveen; Mardziel, Piotr; Datta, Anupam; Grover, Pulkit (October 2021, IEEE Transactions on Information Theory)

Full Text Available
Smoothed Geometry for Robust Attribution

Wang, Zifan; Wang, Haofan; Ramkumar, Shakul; Mardziel, Piotr; Fredrikson, Matt; Datta, Anupam (December 2020, Advances in Neural Information Processing Systems)
null (Ed.)
Full Text Available
Interpreting Interpretations: Organizing Attribution Methods by Criteria

https://doi.org/10.1109/CVPRW50498.2020.00013

Wang, Zifan; Mardziel, Piotr; Datta, Anupam; Fredrikson, Matt (July 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops)
null (Ed.)
Full Text Available
SoK: Differential Privacy as a Causal Property

https://doi.org/10.1109/SP.2020.00012

Tschantz, Michael Carl; Sen, Shayak; Datta, Anupam (May 2020, Proceedings of the IEEE Symposium on Security and Privacy)

We present associative and causal views of differential privacy. Under the associative view, the possibility of dependencies between data points precludes a simple statement of differential privacy's guarantee as conditioning upon a single changed data point. However, we show that a simple characterization of differential privacy as limiting the effect of a single data point does exist under the causal view, without independence assumptions about data points. We believe this characterization resolves disagreement and confusion in prior work about the consequences of differential privacy. The associative view needing assumptions boils down to the contrapositive of the maxim that correlation doesn’t imply causation: differential privacy ensuring a lack of (strong) causation does not imply a lack of (strong) association. Our characterization also opens up the possibility of applying results from statistics, experimental design, and science about causation while studying differential privacy.
more » « less
Full Text Available
Influence Paths for Characterizing Subject-Verb Number Agreement in LSTM Language Models

https://doi.org/10.18653/v1/2020.acl-main.430

Lu, Kaiji; Mardziel, Piotr; Leino, Klas; Fredrikson, Matt; Datta, Anupam (July 2020, 58th Annual Meeting of the Association for Computational Linguistics)
null (Ed.)
Full Text Available
An Information-Theoretic Quantification of Discrimination with Exempt Features

https://doi.org/10.1609/aaai.v34i04.5794

Dutta, Sanghamitra; Venkatesh, Praveen; Mardziel, Piotr; Datta, Anupam; Grover, Pulkit (June 2020, Proceedings of the AAAI Conference on Artificial Intelligence)

The needs of a business (e.g., hiring) may require the use of certain features that are critical in a way that any discrimination arising due to them should be exempted. In this work, we propose a novel information-theoretic decomposition of the total discrimination (in a counterfactual sense) into a non-exempt component, which quantifies the part of the discrimination that cannot be accounted for by the critical features, and an exempt component, which quantifies the remaining discrimination. Our decomposition enables selective removal of the non-exempt component if desired. We arrive at this decomposition through examples and counterexamples that enable us to first obtain a set of desirable properties that any measure of non-exempt discrimination should satisfy. We then demonstrate that our proposed quantification of non-exempt discrimination satisfies all of them. This decomposition leverages a body of work from information theory called Partial Information Decomposition (PID). We also obtain an impossibility result showing that no observational measure of non-exempt discrimination can satisfy all of the desired properties, which leads us to relax our goals and examine alternative observational measures that satisfy only some of these properties. We then perform a case study using one observational measure to show how one might train a model allowing for exemption of discrimination due to critical features.
more » « less
Full Text Available
Feature-Wise Bias Amplification

Leino, Klas; Fredrikson, Matt; Black, Emily; Sen, Shayak; Datta, Anupam (January 2019, International Conference on Learning Representations (ICLR))

We study the phenomenon of bias amplification in classifiers, wherein a machine learning model learns to predict classes with a greater disparity than the underlying ground truth. We demonstrate that bias amplification can arise via inductive bias in gradient descent methods resulting in overestimation of importance of moderately-predictive weak'' features if insufficient training data is available. This overestimation gives rise to feature-wise bias amplification -- a previously unreported form of bias that can be traced back to the features of a trained model. Through analysis and experiments, we show that the while some bias cannot be mitigated without sacrificing accuracy, feature-wise bias amplification can be mitigated through targeted feature selection. We present two new feature selection algorithms for mitigating bias amplification in linear models, and show how they can be adapted to convolutional neural networks efficiently. Our experiments on synthetic and real data demonstrate that these algorithms consistently lead to reduced bias without harming accuracy, in some cases eliminating predictive bias altogether while providing modest gains in accuracy.
more » « less
Full Text Available
Influence-Directed Explanations for Deep Convolutional Networks

https://doi.org/10.1109/TEST.2018.8624792

Leino, Klas; Sen, Shayak; Datta, Anupam; Fredrikson, Matt; Li, Linyi (October 2018, IEEE International Test Conference (ITC))

We study the problem of explaining a rich class of behavioral properties of deep neural networks. Distinctively, our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on a quantity and distribution of interest, using an axiomatically-justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by demonstrating a number of its unique capabilities on convolutional neural networks trained on ImageNet. Our evaluation demonstrates that influence-directed explanations (1) identify influential concepts that generalize across instances, (2) can be used to extract the “essence” of what the network learned about a class, and (3) isolate individual features the network uses to make decisions and distinguish related classes.
more » « less
Full Text Available
Hunting for Discriminatory Proxies in Linear Regression Models

Yeom, Samuel; Datta, Anupam; Fredrikson, Matt (January 2018, Advances in neural information processing systems)

A machine learning model may exhibit discrimination when used to make decisions involving people. One potential cause for such outcomes is that the model uses a statistical proxy for a protected demographic attribute. In this paper we formulate a definition of proxy use for the setting of linear regression and present algorithms for detecting proxies. Our definition follows recent work on proxies in classification models, and characterizes a model's constituent behavior that: 1) correlates closely with a protected random variable, and 2) is causally influential in the overall behavior of the model. We show that proxies in linear regression models can be efficiently identified by solving a second-order cone program, and further extend this result to account for situations where the use of a certain input variable is justified as a business necessity''. Finally, we present empirical results on two law enforcement datasets that exhibit varying degrees of racial disparity in prediction outcomes, demonstrating that proxies shed useful light on the causes of discriminatory behavior in models.
more » « less
Full Text Available

« Prev Next »

Search for: All records