skip to main content

Title: Bounded-Rational Pursuit-Evasion Games
We present a framework that incorporates the principle of bounded rationality into dynamic stochastic pursuit-evasion games. The solution of a stochastic game is generally characterized by its (Nash) equilibria in feedback form, whose calculation may require extensive computational resources. In this paper, the agents are modeled as bounded rational entities with limited computational capabilities. We illustrate the proposed framework by applying it to a pursuit-evasion game between two aerial vehicles in a stochastic wind field. We show how such a game may be discretized and properly analyzed by casting it as an iterative sequence of finite-state Markov Decision Processes (MDPs). Leveraging tools and algorithms from the cognitive hierarchy theory (“level-k thinking”) we compute the solution of the ensuing discrete game, while taking into consideration the rationality level of each agent. We also present an online algorithm for each agent to infer its opponent's rationality level.
Authors:
; ; ;
Award ID(s):
1849130
Publication Date:
NSF-PAR ID:
10315945
Journal Name:
American Control Conference
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, pursuit-evasion scenarios in a stochastic flow field involving one pursuer and one evader are analyzed. Using a forward reachability set-based approach and the associated level set equations, nominal solutions of the players are generated. The dynamical system is linearized along the nominal solution to formulate a chance-constrained, linear-quadratic stochastic dynamic game. Assuming an affine disturbance feedback structure, the proposed game is solved using the standard Gauss-Seidel iterative scheme. Numerical simulations demonstrate the proposed approach for realistic flow fields.
  2. We consider the high-dimensional linear regression problem, where the algorithmic goal is to efficiently infer an unknown feature vector $\beta^*\in\mathbb{R}^p$ from its linear measurements, using a small number $n$ of samples. Unlike most of the literature, we make no sparsity assumption on $\beta^*$, but instead adopt a different regularization: In the noiseless setting, we assume $\beta^*$ consists of entries, which are either rational numbers with a common denominator $Q\in\mathbb{Z}^+$ (referred to as $Q-$rationality); or irrational numbers taking values in a rationally independent set of bounded cardinality, known to learner; collectively called as the mixed-range assumption. Using a novel combination of the Partial Sum of Least Squares (PSLQ) integer relation detection, and the Lenstra-Lenstra-Lov\'asz (LLL) lattice basis reduction algorithms, we propose a polynomial-time algorithm which provably recovers a $\beta^*\in\mathbb{R}^p$ enjoying the mixed-range assumption, from its linear measurements $Y=X\beta^*\in\mathbb{R}^n$ for a large class of distributions for the random entries of $X$, even with one measurement ($n=1$). In the noisy setting, we propose a polynomial-time, lattice-based algorithm, which recovers a $\beta^*\in\mathbb{R}^p$ enjoying the $Q-$rationality property, from its noisy measurements $Y=X\beta^*+W\in\mathbb{R}^n$, even from a single sample ($n=1$). We further establish that for large $Q$, and normal noise, this algorithm tolerates information-theoretically optimal level ofmore »noise. We then apply these ideas to develop a polynomial-time, single-sample algorithm for the phase retrieval problem. Our methods address the single-sample ($n=1$) regime, where the sparsity-based methods such as the Least Absolute Shrinkage and Selection Operator (LASSO) and the Basis Pursuit are known to fail. Furthermore, our results also reveal algorithmic connections between the high-dimensional linear regression problem, and the integer relation detection, randomized subset-sum, and shortest vector problems.« less
  3. The human ability to deceive others and detect deception has long been tied to theory of mind. We make a stronger argument: in order to be adept liars – to balance gain (i.e. maximizing their own reward) and plausibility (i.e. maintaining a realistic lie) – humans calibrate their lies under the assumption that their partner is a rational, utility-maximizing agent. We develop an adversarial recursive Bayesian model that aims to formalize the behaviors of liars and lie detectors. We compare this model to (1) a model that does not perform theory of mind computations and (2) a model that has perfect knowledge of the opponent’s behavior. To test these models, we introduce a novel dyadic, stochastic game, allowing for quantitative measures of lies and lie detection. In a second experiment, we vary the ground truth probability. We find that our rational models qualitatively predict human lying and lie detecting behavior better than the non-rational model. Our findings suggest that humans control for the extremeness of their lies in a manner reflective of rational social inference. These findings provide a new paradigm and formal framework for nuanced quantitative analysis of the role of rationality and theory of mind in lying andmore »lie detecting behavior.« less
  4. In computational approaches to bounded rationality, metareasoning enables intelligent agents to optimize their own decision-making process in order to produce effective action in a timely manner. While there have been substantial efforts to develop effective meta-level control for anytime algorithms, existing techniques rely on extensive offline work, imposing several critical assumptions that diminish their effectiveness and limit their practical utility in the real world. In order to eliminate these assumptions, adaptive metareasoning enables intelligent agents to adapt to each individual instance of the problem at hand without the need for significant offline preprocessing. Building on our recent work, we first introduce a model-free approach to meta-level control based on reinforcement learning. We then present a meta-level control technique that uses temporal difference learning. Finally, we show empirically that our approach is effective on a common benchmark in meta-level control.
  5. Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others inmore »the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.« less