skip to main content


Title: Disentangling Influence: Using disentangled representations to audit model predictions
Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect to a single point. Current research has typically focused on one of each of these dimensions. In this paper, we develop disentangled influence audits, a procedure to audit the indirect influence of features. Specifically, we show that disentangled representations provide a mechanism to identify proxy features in the dataset, while allowing an explicit computation of feature influence on either individual outcomes or aggregate-level outcomes. We show through both theory and experiments that disentangled influence audits can both detect proxy features and show, for each individual or in aggregate, which of these proxy features affects the classifier being audited the most. In this respect, our method is more powerful than existing methods for ascertaining feature influence.  more » « less
Award ID(s):
1633387
NSF-PAR ID:
10183991
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of Neural Information Processing Systems (NeurIPS)
Page Range / eLocation ID:
4496--4506
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect to a single point. Current research has typically focused on one of each of these dimensions. In this paper, we develop disentangled influence audits, a procedure to audit the indirect influence of features. Specifically, we show that disentangled representations provide a mechanism to identify proxy features in the dataset, while allowing an explicit computation of feature influence on either individual outcomes or aggregate-level outcomes. We show through both theory and experiments that disentangled influence audits can both detect proxy features and show, for each individual or in aggregate, which of these proxy features affects the classifier being audited the most. In this respect, our method is more powerful than existing methods for ascertaining feature influence. 
    more » « less
  2. Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect to a single point. Current research has typically focused on one of each of these dimensions. In this paper, we develop disentangled influence audits, a procedure to audit the indirect influence of features. Specifically, we show that disentangled representations provide a mechanism to identify proxy features in the dataset, while allowing an explicit computation of feature influence on either individual outcomes or aggregate-level outcomes. We show through both theory and experiments that disentangled influence audits can both detect proxy features and show, for each individual or in aggregate, which of these proxy features affects the classifier being audited the most. In this respect, our method is more powerful than existing methods for ascertaining feature influence. 
    more » « less
  3. AI systems have been known to amplify biases in real-world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between this proxy feature and the protected one may be less clear to a human. In this work, we study the effect of the presence of protected and proxy features on participants’ perception of model fairness and their ability to improve demographic parity over an AI alone. Further, we examine how different treatments—explanations, model bias disclosure and proxy correlation disclosure—affect fairness perception and parity. We find that explanations help people detect direct but not indirect biases. Additionally, regardless of bias type, explanations tend to increase agreement with model biases. Disclosures can help mitigate this effect for indirect biases, improving both unfairness recognition and decision-making fairness. We hope that our findings can help guide further research into advancing explanations in support of fair human-AI decision-making. 
    more » « less
  4. The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits. 
    more » « less
  5. The increasing automation of high-stakes decisions with direct impact on the lives and well-being of individuals raises a number of important considerations. Prominent among these is strategic behavior by individuals hoping to achieve a more desirable outcome. Two forms of such behavior are commonly studied: 1) misreporting of individual attributes, and 2) recourse, or actions that truly change such attributes. The former involves deception, and is inherently undesirable, whereas the latter may well be a desirable goal insofar as it changes true individual qualification. We study misreporting and recourse as strategic choices by individuals within a unified framework. In particular, we propose auditing as a means to incentivize recourse actions over attribute manipulation, and characterize optimal audit policies for two types of principals, utility-maximizing and recourse-maximizing. Additionally, we consider subsidies as an incentive for recourse over manipulation, and show that even a utility-maximizing principal would be willing to devote a considerable amount of audit budget to providing such subsidies. Finally, we consider the problem of optimizing fines for failed audits, and bound the total cost incurred by the population as a result of audits. 
    more » « less