 Award ID(s):
 1955696
 NSFPAR ID:
 10295553
 Date Published:
 Journal Name:
 Proceedings of Machine Learning Research
 ISSN:
 26403498
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)Mean Field Games (MFG) are the class of games with a very large number of agents and the standard equilibrium concept is a Mean Field Equilibrium (MFE). Algorithms for learning MFE in dynamic MFGs are un known in general. Our focus is on an important subclass that possess a monotonicity property called Strategic Complementarities (MFGSC). We introduce a natural refinement to the equilibrium concept that we call TremblingHandPerfect MFE (TMFE), which allows agents to employ a measure of randomization while accounting for the impact of such randomization on their payoffs. We propose a simple algorithm for computing TMFE under a known model. We also introduce a modelfree and a modelbased approach to learning TMFE and provide sample complexities of both algorithms. We also develop a fully online learning scheme that obviates the need for a simulator. Finally, we empirically evaluate the performance of the proposed algorithms via examples motivated by realworld applications.more » « less

Mean field games (MFG) and mean field control (MFC) are critical classes of multiagent models for the efficient analysis of massive populations of interacting agents. Their areas of application span topics in economics, finance, game theory, industrial engineering, crowd motion, and more. In this paper, we provide a flexible machine learning framework for the numerical solution of potential MFG and MFC models. Stateoftheart numerical methods for solving such problems utilize spatial discretization that leads to a curse of dimensionality. We approximately solve highdimensional problems by combining Lagrangian and Eulerian viewpoints and leveraging recent advances from machine learning. More precisely, we work with a Lagrangian formulation of the problem and enforce the underlying Hamilton–Jacobi–Bellman (HJB) equation that is derived from the Eulerian formulation. Finally, a tailored neural network parameterization of the MFG/MFC solution helps us avoid any spatial discretization. Our numerical results include the approximate solution of 100dimensional instances of optimal transport and crowd motion problems on a standard work station and a validation using a Eulerian solver in two dimensions. These results open the door to muchanticipated applications of MFG and MFC models that are beyond reach with existing numerical methods.

null (Ed.)Regret minimization has proved to be a versatile tool for tree form sequential decision making and extensiveform games. In large twoplayer zerosum imperfectinformation games, mod ern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regretminimization algorithms for treeform sequential decision making, including CFR, require (i) an exact model of the player’s decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs—even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restric tions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used—for example, those in which only blackbox access to the environment is available. We give the first, to our knowl edge, regretminimization algorithm that guarantees sublinear regret with high probability even when requirement (i)—and thus also (ii)—is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encoun ters new decision points. We give an efficient algorithm that achieves O(T 3/4) regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algo rithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multiplayer games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.more » « less

Network games are commonly used to capture the strategic interactions among interconnected agents in simultaneous moves. The agents’ actions in a Nash equilibrium must take into account the mutual dependencies connecting them, which is typically obtained by solving a set of fixed point equations. Stackelberg games, on the other hand, model the sequential moves between agents that are categorized as leaders and followers. The corresponding solution concept, the subgame perfect equilibrium, is typically obtained using backward induction. Both game forms enjoy very wide use in the (cyber)security literature, the network game often as a template to study security investment and externality – also referred to as the Interdependent Security (IDS) games – and the Stackelberg game as a formalism to model a variety of attackerdefender scenarios. In this study we examine a model that combines both types of strategic reasoning: the interdependency as well as sequential moves. Specifically, we consider a scenario with a network of interconnected first movers (firms or defenders, whose security efforts and practices collectively determine the security posture of the ecosystem) and one or more second movers, the attacker(s), who determine how much effort to exert on attacking the many potential targets. This gives rise to an equilibrium concept that embodies both types of equilibria mentioned above. We will examine how its existence and uniqueness conditions differ from that for a standard network game. Of particular interest are comparisons between the two game forms in terms of effort exerted by the defender(s) and the attacker(s), respectively, and the freeriding behavior among the defenders.more » « less

null (Ed.)Driven by recent successes in twoplayer, zerosum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibriumbased strategies. However, this approach has been less effective at producing competent players in generalsum games or those with more than two players than in twoplayer, zerosum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a gametheoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decisionmaking settings. To this end, we reexamine mediated equilibrium and deviation types in extensiveform games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.more » « less