Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilities of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
more »
« less
Counterfactual Explanations Can Be Manipulated
Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
more »
« less
- PAR ID:
- 10462971
- Date Published:
- Journal Name:
- Advances in Neural Information Processing Systems (NeurIPS)
- Volume:
- 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
There is an emerging interest in generating robust algorithmic recourse that would remain valid if the model is updated or changed even slightly. Towards finding robust algorithmic recourse (or counterfactual explanations), existing literature often assumes that the original model m and the new model M are bounded in the parameter space, i.e., ||Params(M)−Params(m)||<Δ. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed naturally-occurring model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure – that we call Stability – to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of Stability as defined by our measure will remain valid after potential “naturally-occurring” model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine estimators of our proposed measure and derive a fundamental lower bound on the sample size required to have a precise estimate. We explore methods of using stability measures to generate robust counterfactuals that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as the Rashomon effect.more » « less
-
Automated decision-making systems are increasingly deployed in domains such as hiring and credit approval where negative outcomes can have substantial ramifications for decision subjects. Thus, recent research has focused on providing explanations that help decision subjects understand the decision system and enable them to take actionable recourse to change their outcome. Popular counterfactual explanation techniques aim to achieve this by describing alterations to an instance that would transform a negative outcome to a positive one. Unfortunately, little user evaluation has been performed to assess which of the many counterfactual approaches best achieve this goal. In this work, we conduct a crowd-sourced between-subjects user study (N = 252) to examine the effects of counterfactual explanation type and presentation on lay decision subjects’ understandings of automated decision systems. We find that the region-based counterfactual type significantly increases objective understanding, subjective understanding, and response confidence as compared to the point-based type. We also find that counterfactual presentation significantly effects response time and moderates the effect of counterfactual type for response confidence, but not understanding. A qualitative analysis reveals how decision subjects interact with different explanation configurations and highlights unmet needs for explanation justification. Our results provide valuable insights and recommendations for the development of counterfactual explanation techniques towards achieving practical actionable recourse and empowering lay users to seek justice and opportunity in automated decision workflows.more » « less
-
Automated decision-making systems are increasingly deployed in domains such as hiring and credit approval where negative outcomes can have substantial ramifications for decision subjects. Thus, recent research has focused on providing explanations that help decision subjects understand the decision system and enable them to take actionable recourse to change their outcome. Popular counterfactual explanation techniques aim to achieve this by describing alterations to an instance that would transform a negative outcome to a positive one. Unfortunately, little user evaluation has been performed to assess which of the many counterfactual approaches best achieve this goal. In this work, we conduct a crowd-sourced between-subjects user study (N = 252) to examine the effects of counterfactual explanation type and presentation on lay decision subjects’ understandings of automated decision systems. We find that the region-based counterfactual type significantly increases objective understanding, subjective understanding, and response confidence as compared to the point-based type. We also find that counterfactual presentation significantly effects response time and moderates the effect of counterfactual type for response confidence, but not understanding. A qualitative analysis reveals how decision subjects interact with different explanation configurations and highlights unmet needs for explanation justification. Our results provide valuable insights and recommendations for the development of counterfactual explanation techniques towards achieving practical actionable recourse and empowering lay users to seek justice and opportunity in automated decision workflows.more » « less
-
Counterfactual explanations promote explainability in machine learning models by answering the question “how should the input instance be altered to obtain a desired predicted label?". The comparison of this instance before and after perturbation can enhance human interpretation. Most existing studies on counterfactual explanations are limited in tabular data or image data. In this paper, we study the problem of counterfactual explanation generation on graphs. A few studies have explored to generate counterfactual explanations on graphs, but many challenges of this problem are still not well-addressed: 1) optimizing in the discrete and disorganized space of graphs; 2) generalizing on unseen graphs; 3) maintaining the causality in the generated counterfactuals without prior knowledge of the causal model. To tackle these challenges, we propose a novel framework CLEAR which aims to generate counterfactual explanations on graphs for graph-level prediction models. Specifically, CLEAR leverages a graph variational autoencoder based mechanism to facilitate its optimization and generalization, and promotes causality by leveraging an auxiliary variable to better identify the causal model. Extensive experiments on both synthetic and real-world graphs validate the superiority of CLEAR over state-of-the-art counterfactual explanation methods on graphs in different aspects.more » « less
An official website of the United States government

