skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Strategic Exploration: Pre-emption and Prioritization
This paper analyses a model of strategic exploration in which competing players independently explore a set of alternatives. The model features a multiple-player multiple-armed bandit problem and captures a strategic trade-off between pre-emption—covert exploration of alternatives that the opponent will explore in the future—and prioritization—exploration of the most promising alternatives. Our results explain how the strategic trade-off shapes equilibrium behaviours and outcomes, for example, in technology races between superpowers and R&D competitions between firms. We show that players compete on the same set of alternatives, leading to duplicated exploration from start to finish, and they explore alternatives that are a priori less promising before more promising ones are exhausted. The model also predicts that competition induces players to implement unreliable technologies too early, even though they should wait for the technologies to mature. Coordinated exploration is impossible even if the alternatives are equally promising, but it can emerge in equilibrium following a phase of pre-emptive competition if there is a short deadline. With asymmetric capacities of exploration, the weak player conducts extensive instead of intensive exploration—exploring as many alternatives as the strong player does but never fully exploring any.  more » « less
Award ID(s):
1824328
PAR ID:
10482395
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Review of Economic Studies
ISSN:
0034-6527
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the past few decades, numerous experiments have shown that humans do not always behave so as to maximize their material payoff. Cooperative behavior when noncooperation is a dominant strategy (with respect to the material payoffs) is particularly puzzling. Here we propose a novel approach to explain cooperation, assuming what Halpern and Pass call translucent players. Typically, players are assumed to be opaque, in the sense that a deviation by one player in a normal-form game does not affect the strategies used by other players. However, a player may believe that if he switches from one strategy to another, the fact that he chooses to switch may be visible to the other players. For example, if he chooses to defect in Prisoner’s Dilemma, the other player may sense his guilt. We show that by assuming translucent players, we can recover many of the regularities observed in human behavior in well-studied games such as Prisoner’s Dilemma, Traveler’s Dilemma, Bertrand Competition, and the Public Goods game. The approach can also be extended to take into account a player’s concerns that his social group (or God) may observe his actions. This extension helps explain prosocial behavior in situations in which previous models of social behavior fail to make correct predictions (e.g. conflict situations and situations where there is a trade-off between equity and efficiency). 
    more » « less
  2. We investigate a linear–quadratic stochastic zero-sum game where two players lobby a political representative to invest in a wind farm. Players are time-inconsistent because they discount the utility with a non-constant rate. Our objective is to identify a consistent planning equilibrium in which the players are aware of their inconsistency and cannot commit to a lobbying policy. We analyse equilibrium behaviour in both single-player and two-player cases and compare the behaviours of the game under constant and variable discount rates. The equilibrium behaviour is provided in closed-loop form, either analytically or via numerical approximation. Our numerical analysis of the equilibrium reveals that strategic behaviour leads to more intense lobbying without resulting in overshooting. 
    more » « less
  3. Yllka Velaj and Ulrich Berger (Ed.)
    This paper considers a two-player game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an ϵ-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online convex optimization and the drift-plus penalty technique. 
    more » « less
  4. Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper- form those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude. 
    more » « less
  5. When learning in strategic environments, a key question is whether agents can overcome uncertainty about their preferences to achieve outcomes they could have achieved absent any uncertainty. Can they do this solely through interactions with each other? We focus this question on the ability of agents to attain the value of their Stackelberg optimal strategy and study the impact of information asymmetry. We study repeated interactions in fully strategic environments where players' actions are decided based on learning algorithms that take into account their observed histories and knowledge of the game. We study the pure Nash equilibria (PNE) of a meta-game where players choose these algorithms as their actions. We demonstrate that if one player has perfect knowledge about the game, then any initial informational gap persists. That is, while there is always a PNE in which the informed agent achieves her Stackelberg value, there is a game where no PNE of the meta-game allows the partially informed player to achieve her Stackelberg value. On the other hand, if both players start with some uncertainty about the game, the quality of information alone does not determine which agent can achieve her Stackelberg value. In this case, the concept of information asymmetry becomes nuanced and depends on the game's structure. Overall, our findings suggest that repeated strategic interactions alone cannot facilitate learning effectively enough to earn an uninformed player her Stackelberg value. 
    more » « less