skip to main content


Title: Learning From Strategic Agents: Accuracy, Improvement, and Causality, ICML
In many predictive decision-making scenarios, such as credit scoring and academic testing, a decision-maker must construct a model that accounts for agents' incentives to ``game'' their features in order to receive better decisions. Whereas the strategic classification literature generally assumes that agents' outcomes are not causally dependent on their features (and thus strategic behavior is a form of lying), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. Our formulation is the first to incorporate a crucial phenomenon: when agents act to change observable features, they may as a side effect perturb unobserved features that causally affect their true outcomes. We consider three distinct desiderata for a decision-maker's model: accurately predicting agents' post-gaming outcomes (accuracy), incentivizing agents to improve these outcomes (improvement), and, in the linear setting, estimating the visible coefficients of the true causal model (causal precision). As our main contribution, we provide the first algorithms for learning accuracy-optimizing, improvement-optimizing, and causal-precision-optimizing linear regression models directly from data, without prior knowledge of agents' possible actions. These algorithms circumvent the hardness result of Miller et al. (2019) by allowing the decision maker to observe agents' responses to a sequence of decision rules, in effect inducing agents to perform causal interventions for free.  more » « less
Award ID(s):
1908774
NSF-PAR ID:
10190171
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the design of a class of incentive mechanisms that can effectively prevent cheating in a strategic classification and regression problem. A conventional strategic classification or regression problem is modeled as a Stackelberg game, or a principal-agent problem between the designer of a classifier (the principal) and individuals subject to the classifier's decisions (the agents), potentially from different demographic groups. The former benefits from the accuracy of its decisions, whereas the latter may have an incentive to game the algorithm into making favorable but erroneous decisions. While prior works tend to focus on how to design an algorithm to be more robust to such strategic maneuvering, this study focuses on an alternative, which is to design incentive mechanisms to shape the utilities of the agents and induce effort that genuinely improves their skills, which in turn benefits both parties in the Stackelberg game. Specifically, the principal and the mechanism provider (which could also be the principal itself) move together in the first stage, publishing and committing to a classifier and an incentive mechanism. The agents are (simultaneous) second movers and best respond to the published classifier and incentive mechanism. When an agent's strategic action merely changes its observable features, it hurts the performance of the algorithm. However, if the action leads to improvement in the agent's true label, it not only helps the agent achieve better decision outcomes, but also preserves the performance of the algorithm. We study how a subsidy mechanism can induce improvement actions, positively impact a number of social well-being metrics, such as the overall skill levels of the agents (efficiency) and positive or true positive rate differences between different demographic groups (fairness). 
    more » « less
  2. Celis, L. Elisa (Ed.)
    In this work, we consider classification of agents who can both game and improve. For example, people wishing to get a loan may be able to take some actions that increase their perceived credit-worthiness and others that also increase their true credit-worthiness. A decision-maker would like to define a classification rule with few false-positives (does not give out many bad loans) while yielding many true positives (giving out many good loans), which includes encouraging agents to improve to become true positives if possible. We consider two models for this problem, a general discrete model and a linear model, and prove algorithmic, learning, and hardness results for each. For the general discrete model, we give an efficient algorithm for the problem of maximizing the number of true positives subject to no false positives, and show how to extend this to a partial-information learning setting. We also show hardness for the problem of maximizing the number of true positives subject to a nonzero bound on the number of false positives, and that this hardness holds even for a finite-point version of our linear model. We also show that maximizing the number of true positives subject to no false positive is NP-hard in our full linear model. We additionally provide an algorithm that determines whether there exists a linear classifier that classifies all agents accurately and causes all improvable agents to become qualified, and give additional results for low-dimensional data. 
    more » « less
  3. null (Ed.)
    Machine learning systems are often used in settings where individuals adapt their features to obtain a desired outcome. In such settings, strategic behavior leads to a sharp loss in model performance in deployment. In this work, we aim to address this problem by learning classifiers that encourage decision subjects to change their features in a way that leads to improvement in both predicted \emph{and} true outcome. We frame the dynamics of prediction and adaptation as a two-stage game, and characterize optimal strategies for the model designer and its decision subjects. In benchmarks on simulated and real-world datasets, we find that classifiers trained using our method maintain the accuracy of existing approaches while inducing higher levels of improvement and less manipulation. 
    more » « less
  4. null (Ed.)
    As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker "leads" in the game by deploying a model, and the strategic agents "follow" by playing their best response to the deployed model. Importantly, in this framing, the burden of learning is placed solely on the decision-maker, while the agents' best responses are implicitly treated as instantaneous. In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions. In particular, by generalizing the standard model to allow both players to learn over time, we show that a decision-maker that makes updates faster than the agents can reverse the order of play, meaning that the agents lead and the decision-maker follows. We observe in standard learning settings that such a role reversal can be desirable for both the decision-maker and the strategic agents. Finally, we show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play. 
    more » « less
  5. In this thesis we propose novel estimation techniques for localization and planning problems, which are key challenges in long-term autonomy. We take inspiration in our methods from non-parametric estimation and use tools such as kernel density estimation, non-linear least-squares optimization, binary masking, and random sampling. We show that these methods, by avoiding explicit parametric models, outperform existing methods that use them. Despite the seeming differences between localization and planning, we demonstrate in this thesis that the problems share core structural similarities. When real or simulation-sampled measurements are expensive, noisy, or high variance, non-parametric estimation techniques give higher-quality results in less time. We first address two localization problems. In order to permit localization with a set of ad hoc-placed radios, we propose an ultra-wideband (UWB) graph realization system to localize the radios. Our system achieves high accuracy and robustness by using kernel density estimation for measurement probability densities, by explicitly modeling antenna delays, and by optimizing this combination with a non-linear least squares formulation. Next, in order to then support robotic navigation, we present a flexible system for simultaneous localization and mapping (SLAM) that combines elements from both traditional dense metric SLAM and topological SLAM, using a binary "masking function" to focus attention. This masking function controls which lidar scans are available for loop closures. We provide several masking functions based on approximate topological class detectors. We then examine planning problems in the final chapter and in the appendix. In order to plan with uncertainty around multiple dynamic agents, we describe Monte-Carlo Policy-Tree Decision Making (MCPTDM), a framework for efficiently computing policies in partially-observable, stochastic, continuous problems. MCPTDM composes a sequence of simpler closed-loop policies and uses marginal action costs and particle repetition to improve cost estimates and sample efficiency by reducing variance. Finally, in the appendix we explore Learned Similarity Monte-Carlo Planning (LSMCP), where we seek to enhance the sample efficiency of partially observable Monte Carlo tree search-based planning by taking advantage of similarities in the final outcomes of similar states and actions. We train a multilayer perceptron to learn a similarity function which we then use to enhance value estimates in the planning. Collectively, we show in this thesis that non-parametric methods promote long-term autonomy by reducing error and increasing robustness across multiple domains. 
    more » « less