skip to main content

Title: Multiplayer Performative Prediction: Learning in Decision-Dependent Games
Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers’ actions. This paper formulates a new game theoretic framework for this phenomenon, called multi-player performative prediction. We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but are generally computationally difficult to find since they are solutions of nonmonotone games. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Journal of Machine Learning Research
Date Published:
Journal Name:
Journal of Machine Learning Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract We introduce a new model of repeated games in large populations with random matching, overlapping generations, and limited records of past play. We prove that steady-state equilibria exist under general conditions on records. When the updating of a player’s record can depend on the actions of both players in a match, any strictly individually rational action can be supported in a steady-state equilibrium. When record updates can depend only on a player’s own actions, fewer actions can be supported. Here, we focus on the prisoner’s dilemma and restrict attention to strict equilibria that are coordination-proof, meaning that matched partners never play a Pareto-dominated Nash equilibrium in the one-shot game induced by their records and expected continuation payoffs. Such equilibria can support full cooperation if the stage game is either “strictly supermodular and mild” or “strongly supermodular,” and otherwise permit no cooperation at all. The presence of “supercooperator” records, where a player cooperates against any opponent, is crucial for supporting any cooperation when the stage game is “severe.” 
    more » « less
  2. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at 
    more » « less
  3. null (Ed.)
    Finding Nash equilibria in two-player zero-sum continuous games is a central problem in machine learning, e.g. for training both GANs and robust models. The existence of pure Nash equilibria requires strong conditions which are not typically met in practice. Mixed Nash equilibria exist in greater generality and may be found using mirror descent. Yet this approach does not scale to high dimensions. To address this limitation, we parametrize mixed strategies as mixtures of particles, whose positions and weights are updated using gradient descent-ascent. We study this dynamics as an interacting gradient flow over measure spaces endowed with the Wasserstein-Fisher-Rao metric. We establish global convergence to an approximate equilibrium for the related Langevin gradient-ascent dynamic. We prove a law of large numbers that relates particle dynamics to mean-field dynamics. Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs. 
    more » « less
  4. We study the fair division problem of allocating a mixed manna under additively separable piecewise linear concave (SPLC) utilities. A mixed manna contains goods that everyone likes and bads (chores) that everyone dislikes as well as items that some like and others dislike. The seminal work of Bogomolnaia et al. argues why allocating a mixed manna is genuinely more complicated than a good or a bad manna and why competitive equilibrium is the best mechanism. It also provides the existence of equilibrium and establishes its distinctive properties (e.g., nonconvex and disconnected set of equilibria even under linear utilities) but leaves the problem of computing an equilibrium open. Our main results are a linear complementarity problem formulation that captures all competitive equilibria of a mixed manna under SPLC utilities (a strict generalization of linear) and a complementary pivot algorithm based on Lemke’s scheme for finding one. Experimental results on randomly generated instances suggest that our algorithm is fast in practice. Given the [Formula: see text]-hardness of the problem, designing such an algorithm is the only non–brute force (nonenumerative) option known; for example, the classic Lemke–Howson algorithm for computing a Nash equilibrium in a two-player game is still one of the most widely used algorithms in practice. Our algorithm also yields several new structural properties as simple corollaries. We obtain a (constructive) proof of existence for a far more general setting, membership of the problem in [Formula: see text], a rational-valued solution, and an odd number of solutions property. The last property also settles the conjecture of Bogomolnaia et al. in the affirmative. Furthermore, we show that, if the number of either agents or items is a constant, then the number of pivots in our algorithm is strongly polynomial when the mixed manna contains all bads. 
    more » « less
  5. Online reviews provide product evaluations for customers to makedecisions. Unfortunately, the evaluations can be manipulated us-ing fake reviews (“spams”) by professional spammers, who havelearned increasingly insidious and powerful spamming strategiesby adapting to the deployed detectors. Spamming strategies arehard to capture, as they can be varying quickly along time, differentacross spammers and target products, and more critically, remainedunknown in most cases. Furthermore, most existing detectors focuson detection accuracy, which is not well-aligned with the goal ofmaintaining the trustworthiness of product evaluations. To addressthe challenges, we formulate a minimax game where the spammersand spam detectors compete with each other on their practical goalsthat are not solely based on detection accuracy. Nash equilibria ofthe game lead to stable detectors that are agnostic to any mixeddetection strategies. However, the game has no closed-form solu-tion and is not differentiable to admit the typical gradient-basedalgorithms. We turn the game into two dependent Markov Deci-sion Processes (MDPs) to allow efficient stochastic optimizationbased on multi-armed bandit and policy gradient. We experimenton three large review datasets using various state-of-the-art spam-ming and detection strategies and show that the optimization al-gorithm can reliably find an equilibrial detector that can robustlyand effectively prevent spammers with any mixed spamming strate-gies from attaining their practical goal. Our code is available at 
    more » « less