<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Deception in Nash Equilibrium Seeking</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>12/01/2025</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10651782</idno>
					<idno type="doi">10.1109/TAC.2025.3582524</idno>
					<title level='j'>IEEE Transactions on Automatic Control</title>
<idno>0018-9286</idno>
<biblScope unit="volume">70</biblScope>
<biblScope unit="issue">12</biblScope>					

					<author>Michael Tang</author><author>Umar Javed</author><author>Xudong Chen</author><author>Miroslav Krstić</author><author>Jorge I Poveda</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Not Available]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>I N MULTIAGENT systems with competitive decision- makers, the act of deception is typically defined as the systematic exploitation of privileged information to induce false beliefs in other agents, leading them to outcomes that are detrimental to their performance or advantageous to the deceiving entity <ref type="bibr">[1]</ref>. In game theory, a player who employs deceptive strategies is termed a deceiver. Their aim is to enhance their own outcomes covertly, often without the other players' awareness, who are typically oblivious to such deceitful behavior. Deception in games has gained significant research attention in recent years, particularly due to concerns about safety and security in engineering systems <ref type="bibr">[2]</ref>. Deception has been rigorously studied across various domains, including robotics and aerospace control <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>, and it has also been recognized as a key evolutionary feature observed in biological systems <ref type="bibr">[4]</ref> and human societies <ref type="bibr">[5]</ref>.</p><p>In this article, we study model-free deception in N-player games, where each player aims to unilaterally minimize its own cost function. For such games, the traditional solution concept studied in the literature corresponds to the notion of a Nash equilibrium (NE) <ref type="bibr">[6]</ref>. When their collective actions correspond to an NE, no player has any incentive to deviate from their current action, leading to an equilibrium state for the system that is of interest in many coordination and control problems <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>. When the model of the cost function is unknown, and payoff-based strategies are required, various NE-seeking algorithms have been studied in the literature. For example, in N -person noncooperative games with individual real-valued cost functions J i (x), i &#8712; {1, 2, . . . , N } and vector of actions x = [x 1 , . . . , x N ] , the work <ref type="bibr">[11]</ref> introduced a class of model-free and adaptive NE seeking dynamics that implements exploration and exploitation actions in each player i via the following dynamics:</p><p>where a, k, and &#969; i are positive tunable parameters, u i is an auxiliary state implemented by every player i, and &#956;(&#969; i t) is a local continuous periodic probing signal characterizing the exploration policy implemented by the agent. As shown in <ref type="bibr">[11]</ref>, for a general class of games, the payoff based algorithm (1) can attain convergence to a neighborhood of the NE of the underlying game characterized by the cost functions J i . Such results opened the door to the development of a variety of NE-seeking algorithms in the context of systems with delays <ref type="bibr">[12]</ref>, games with constraints <ref type="bibr">[13]</ref>, dynamics with momentum <ref type="bibr">[14]</ref>, games over networks with mild coupling <ref type="bibr">[15]</ref>, nonsmooth algorithms <ref type="bibr">[16]</ref>, etc. Since system (1) can be seen as a continuous-time version of a discrete-time zeroth-order algorithm based on simultaneous perturbations <ref type="bibr">[17]</ref>, the results of <ref type="bibr">[11]</ref> have also been connected and extended to discrete-time and stochastic settings using tools, such as discrete-time averaging <ref type="bibr">[18]</ref>, stochastic calculus <ref type="bibr">[19]</ref>, and stochastic approximations <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref>, to name just a few. Regardless of the specific NE-seeking dynamics implemented by the players, the majority of works in the literature have primarily focused on studying decision-making problems with symmetric information, where all agents have access to the same type of signals to implement their algorithms. However, in many practical settings involving competitive and/or adversarial entities, a subset of agents has access to private information regarding the other players' algorithms. Such scenarios break the classic assumption of symmetric information, leading to games where players with privileged information can systematically use it to their advantage, sometimes called signaling games <ref type="bibr">[22]</ref>. How to utilize such information in NE-seeking systems without causing instabilities, while consistently improving the outcomes of players with privileged information, is a question that has not been thoroughly explored in the literature. This question motivates this work. In particular, we study a class of stable deceptive NE-seeking algorithms that generalize <ref type="bibr">(1)</ref> to games with asymmetric information, where two different types of players emerge: oblivious players, who implement the classic dynamics <ref type="bibr">(1)</ref>, and deceptive players who implement the following policies.Zeroth-order exploitation policy:</p><p>Dynamic state-dependent exploration policy:</p><p>&#951;i (t) = &#949;F i (&#951; i (t), J i (x(t)), u i (t)) , &#949; &gt; 0 (2c)</p><p>where F i and h i are suitable smooth functions and &#949; &gt; 0 is a small parameter designed to induce a time scale separation between the NES dynamics and the deception mechanism.</p><p>The key difference between the nominal and deceptive NEseeking dynamics lies in the dynamic, state-dependent exploration policy implemented by the deceivers. In general, the deceiver can tune &#948; i using only measurements of their action and their payoff. We show that this new policy, which leverages privileged knowledge of the probing signals &#956;(&#969; i j ) of a subset of oblivious players {i j } n j=1 , can systematically achieve closed-loop stable behaviors in the multiagent system, while simultaneously inducing false beliefs in the oblivious players, leading them to implement actions that converge to the NE of a different game than the one being played. This "deceptive Nash equilibrium (DNE)" may result in better outcomes for the privileged players. As seen in (2b), the proposed deception mechanism takes the form of an additive modification to the deceiver's action update that incorporates the victims' exploration signals and with amplitudes updated via dynamic feedback.</p><p>The following are the main contributions of this article. 1) We introduce the concept of "deception" within the context of the NE seeking schemes proposed in <ref type="bibr">[11]</ref>, which are based on algorithms that simultaneously implement exploration and exploitation policies, as in <ref type="bibr">(1)</ref>. We show that if a subset of the players has access to private information related to the exploration policies of other players, then they can use this information to their advantage by implementing the dynamics (2) to influence the behavior of the oblivious players that implement the nominal NE seeking dynamics <ref type="bibr">(1)</ref>. As a result, the actions of oblivious players converge to the reaction curves of a different game, which is parameterized by the deceiving player. Specifically, we show that if a nominal, model-based, pseudogradient flow dynamic exponentially stabilizes a "classic" NE in a noncooperative game, then the proposed model-free dynamics with deception will retain the exponential stability properties, but now with respect to a new DNE. We characterize the geometric properties of this DNE by studying how deception affects the reaction curves of the game learned by the oblivious players. We show that for quadratic games, deception can effectively rotate or translate the reaction curves of the oblivious players, while for aggregative games deception induces nonlinear transformations that can increase the number of equilibria. By using the duopoly game as an example, we discuss various microeconomic interpretations of deception in Nash-seeking dynamics within competitive markets.</p><p>2) To align DNEs with desired individual outcomes, we show that deceptive players can modify in real-time the underlying game played by the oblivious players until the value of the cost function of the deceiver aligns with a desired reference value. This is achieved through dynamic deceptive exploration signals, managed by an auxiliary payoff-based mechanism implemented by the deceiving agents. We rigorously characterize the conditions under which the deceptive player can improve their own objective function and provide a conservative estimate of the potential improvement while maintaining the stability of the game. We also study mutual deception when multiple deceivers have privileged information about each other's exploration policies.</p><p>3) By leveraging the geometric characterization of the DNE, we provide sufficient conditions on the game parameters that ensure "immunity" against deceptive players. These conditions offer valuable insights into designing counter-deception mechanisms for games and adaptive NE seeking algorithms. We also study deceptive dynamics with integral and approximate proportional action, which are able to improve transient performance to minimize potential deception detection via transient analysis. All our results are validated through simulations and numerical analysis. To the best of the authors' knowledge, this work is the first to study stable deception mechanisms in model-free NE seeking dynamics.</p><p>No portion of this article was published at a conference. The rest of this article is organized as follows. Section II presents the preliminaries. Section III introduces the problem of deceptive NE seeking. Section IV presents the main results for N-player quadratic games. Section V focuses on deception applied to N -player (strongly monotone) aggregative games. Finally, Section VI concludes this article.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PRELIMINARIES</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Notations:</head><p>We use a ij to denote the matrix A whose (i, j) entry is given by a ij . Given a matrix Q i and vector b i , we use (Q i ) jk to denote the (j, k)th entry of Q i , (Q i ) j: to denote the jth row of Q i , (Q i ) :j to denote the jth column of Q i , and (b i ) j to denote jth entry of b i . Furthermore, we let [Q] i,j denote the matrix formed by taking Q and removing the ith row and jth column. Having "&#8764;" in the i (or j) entry means no row (or column) is removed. For instance, [Q] &#8764;,3 is the matrix formed by only removing the third column of Q (no row is removed). Given N &#8712; N, we use [N ] to denote the set of positive integers no greater than N , i.e., <ref type="bibr">[N ]</ref>  </p><p>2) Game Theory: We consider N -player non-cooperative games, where the action of player i is a scalar x i &#8712; R, and each player aims to unilaterally minimize its own cost J i , which also depends on the actions of the other players, and which is assumed to be continuously differentiable. We use [N ] = {1, 2, . . . , N } to denote the set of players, and x = [x 1 , . . ., x N ] to denote the vector of actions. Similarly, we let x -i &#8712; R N -1 be the vector containing the actions of all players other than the ith player. Given real-valued cost functions J i (x i , x -i ) : R N &#8594; R, for all i, a NE <ref type="bibr">[6]</ref> is a vector x * &#8712; R N that satisfies</p><p>that is, if the other players implement x * -i , then no player has incentive to unilaterally deviate from x * i &#8712; R to improve their cost. For a broad class of noncooperative games, NE can be characterized using the pseudogradient of the game, which is a mapping</p><p>In particular, an NE x * satisfies G(x * ) = 0, but the converse is in general not true without additional assumptions. For each x -i &#8712; R N -1 , the reaction curve of player i is defined as the set-valued mapping RC i (x -i ) : R N -1 &#8658; R that satisfies &#8711; i J i (RC i (x -i ), x -i ) = {0}. Since reaction curves are characterized by their graph, in this article, a "reaction curve" will refer to the set N (&#8711; i J i (x)). While we assume scalar actions to avoid cumbersome notation, our models and results extend to games with vector actions via Kronecker products.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. STABLE DECEPTION IN GAMES WITH ASYMMETRIC INFORMATION</head><p>For a general class of games with symmetric information, the model-free dynamics <ref type="bibr">(1)</ref> have been show to attain local or semiglobal practical NE seeking <ref type="bibr">[11]</ref>, <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>. The key enabler for the "learning" capabilities of system (1) are the probing signals &#956; that use frequencies &#969; i &#8712; R &gt;0 , which are selected to satisfy the following assumption.</p><p>Assumption 1: The frequencies satisfy &#969; i = &#969; j for i = j, and</p><p>Assumption 1 is instrumental for the analysis and generalizing the set of permissible exploration frequencies beyond Q &gt;0 . Under Assumption 1, for sufficiently small values of a and sufficiently large values of &#969;, the trajectories of system (1) can be approximated (on compact sets) by a perturbed pseudogradient flow of the form u = -kG(&#361;) + O(a).</p><p>Since nominal pseudogradient flows of the form u = -kG(&#361;) converge to the NE in a variety of games <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, suitable (practical) stability properties can be established for (1) under an appropriate tuning of its parameters. Note that, in system (4), the ith component of &#361; is given by ui</p><p>which shows that, when the O(a) perturbation is neglected, the equilibria of (4) are precisely characterized by the intersection of the reaction curves for all i &#8712; [N ]. Hence, to study deception in a general setting, we start by making the following assumption that pertains to the behavior of the unperturbed system (4) in a nominal game with symmetric information. Assumption 2: J i &#8712; C 2 (R N , R), there exists a unique NE x * &#8712; R N for the game {J i } N i=1 , and the dynamics &#7819; = -kG(x) render x * uniformly globally exponentially stable.</p><p>Remark 1: The stability properties of Assumption 2 are satisfied in a variety of games, including strongly monotone, which are common in the literature of NE seeking <ref type="bibr">[16]</ref>. However, we note that the assumption can be relaxed to local exponential stability, which holds for a larger class of games <ref type="bibr">[11]</ref>, and which can be studied via linearization methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Connections to Encryption/Decryption Schemes</head><p>The principles behind systems of the form (1) or (2), which simultaneously implement exploration and exploitation mechanisms, apply to other algorithms, including discrete-time and stochastic algorithms where players use random perturbations with zero expectation and identity correlation instead of periodic deterministic dither signals (see <ref type="bibr">[20]</ref>, <ref type="bibr">[25]</ref>, and <ref type="bibr">[26]</ref>). In particular, many multiagent model-free equilibrium-seeking algorithms rely on two key principles: 1) modulation: each agent uses an exploration signal with a unique key feature to perturb their nominal action u i ; and 2) demodulation: each agent uses the same exploration signal to extract information from their cost function about their cost derivatives. This information is then used by a preselected exploitation policy to update u i to "seek" for the NE. and decryption (D) using private exploration policies. Top: Scheme proposed in <ref type="bibr">[11]</ref> for games with symmetric information. Bottom: Schemes studied in this paper for games with asymmetric information and deceptive players. For this figure, we took a = 1 for illustration purposes.</p><p>The two principles mentioned above happen to be identical to those employed in symmetric-key algorithms used in cryptography for encryption and decryption of messages transmitted via communication channels in adversarial environments <ref type="bibr">[27,</ref><ref type="bibr">Ch. 2]</ref>. Indeed, the stability properties of a large class of algorithms based on simultaneous perturbations usually rely on conditions similar to those in Assumption 1, where each player uses a different "key" to encode and decode information about their own current assessment of the game. The left plot of Fig. <ref type="figure">1</ref> illustrates this connection. This approach is reasonable in symmetric games. However, as shown in the right plot of Fig. <ref type="figure">1</ref>, in asymmetric games where some players access others' "keys," the privileged player's new information opens the door to novel strategic adversarial behaviors and, specially, feedback schemes that can exploit this information and which have not been studied in NE seeking problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Nash Seeking With Deceptive and Oblivious Players</head><p>We consider noncooperative games with asymmetric information, where some of the players have access to private information about the exploration policy of other players' algorithms, and use this information to their advantage. We call such players "deceptive."</p><p>if its actions are updated via the following rule:</p><p>for all t &#8712; dom(u d ), where &#948; is updated via the deceptive dynamics</p><p>where &#951; d &#8712; R m is an auxiliary state, &#949; &gt; 0 is a parameter, and F d , h are suitable smooth functions.</p><p>To simplify notation, we remove the time dependency from the states of the system. When a player d is being deceptive toward some player d j , we say that player d j is oblivious under player d. The set of all oblivious players under player d is denoted as</p><p>and the set of all oblivious players in the game is denoted as</p><p>Similarly, the set of all deceptive players is denoted as</p><p>In the deceptive strategy ( <ref type="formula">6</ref>), the parameter &#948; plays a crucial role. It will be used by the deceptive player to control the game that is learned in real time by the oblivious player.</p><p>Definition 2: Given a noncooperative game</p><p>In other words, Definition 2 introduces families of games { Ji } i&#8712;[N ] for which only the oblivious players, characterized by the index set D, have cost functions of the form (8) instead of the nominal cost J i . Such functions depend also on the derivatives of the nominal costs J i with respect to the actions of the players k &#8712; K i , i.e., the "externalities" that those players have on player i. The &#963; i (x -i ) term represents any C 0 function that depends on the actions of any set of players excluding that of player i, thus it has no effect on the pseudogradient. Note that, in general, both oblivious and nonoblivious players could be part of the sets K i for some i &#8712; [N ]. However, Definition 2 rules out players who are "self-deceiving," although it leaves open the door to "mutual deception," which we will study in Section IV-C.</p><p>The following proposition computes the average dynamics of ( <ref type="formula">1</ref>) and ( <ref type="formula">6</ref>). The proof follows directly from the proof of Theorem 1 in Section III-D.</p><p>Proposition 1: Suppose that Assumption 1 holds, and consider an N-player noncooperative game {J i } i&#8712;[N ] , where a nonempty set D * &#8834; [N ] of deceptive players implements <ref type="bibr">(6)</ref>, and the rest of the players implement <ref type="bibr">(1)</ref>. Then, the average dynamics of the players are given by</p><p>where</p><p>The role of Proposition 1 is to characterize the effect of the exploratory policy (6a) on the average dynamics of the Nashseeking system. That is, deceptive players modify the average dynamics of the oblivious players by injecting externalities into the vector fields, each externality being weighted by the controlled parameter &#948; j . By effectively inducing false beliefs (i.e., &#8711; i J instead of &#8711; i J i ) in the oblivious players' dynamics, deceptive players create a new pseudogradient flow (9) defined over the deceptive game Ji instead of the nominal game J i . If a deceptive player is not oblivious (i.e., no other player has access to his exploration policy), then his average dynamics remain unchanged and still approximate <ref type="bibr">(5)</ref>. In other words, deceptive nonoblivious players still learn their correct reaction curves, but they are able to control the reaction curves learned by the other players.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Deceptive Nash Equilibria: An Illustrative Example</head><p>Given a deceptive N-player game</p><p>In general, a DNE might not be a true NE unless it also satisfies second order conditions <ref type="bibr">[28]</ref> with respect to the deceptive game, i.e., with Ji instead of J i . Nevertheless, whether x &#948; is a true NE of the deceptive game is unimportant in the context of stable deception. Yet, in many cases, the DNE turns out to be an NE of the deceptive game. The following example illustrates these ideas using the well-known duopoly game, extensively studied in symmetric-information games <ref type="bibr">[11]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[29]</ref>.</p><p>Example 1 (The Duopoly Game With Deceptive Entities): Consider a duopoly market where the two players represent companies that set the prices of their products. Let x 1 (t) denote the price of company 1's product at time t, and let x 2 (t) denote the price of company 2's product at the same time t. As shown in <ref type="bibr">[11]</ref>, the negative of the profits of the companies in the market is given by</p><p>where, for each company i, s i is the number of sales, m i is the marginal cost, and x im i is the profit per unit. Using the model in <ref type="bibr">[11]</ref>, and assuming that the consumers have a preference for the product of the company 1, as long as its price is not too large compared to the price of the product of company 2, we can model the sales using the functions</p><p>, where p &gt; 0 quantifies the preference for company 1, and S d is the total consumer demand, assumed to be constant for simplicity. As shown in [11, Sect. II], the unique NE of this game is</p><p>In <ref type="bibr">[11,</ref><ref type="bibr">Thm. 1]</ref>, the above game was studied under a symmetric-information assumption, and it was shown that if both players implement the NE seeking dynamics (1), then the prices x i will converge to a neighborhood of x * , provided that the parameters a and &#969; are appropriately tuned. Now, suppose that company 2 has obtained knowledge of the exploration policy &#956;(&#969; 1 t) used by player 1, thus breaking the symmetric-information assumption. Using this knowledge, company 2 revises its strategy and implements ( <ref type="formula">6</ref>) with a sinusoidal probing function</p><p>Company 1 is unaware of this change, and continues implementing the vanilla NE-seeking dynamics (1). The resulting average dynamics of the companies are given by u1</p><p>Neglecting the O(a) perturbation, this dynamics correspond to the pseudogradient flow of a game with costs ( J1 , J 2 ), where J 2 is still given by ( <ref type="formula">10</ref>), but J1 is now given by ( <ref type="formula">8</ref>), i.e.,</p><p>Since &#8706;J 1 (y,x 2 )</p><p>, we have</p><p>. Thus, using <ref type="bibr">(10)</ref>, the structure of s 1 , and choosing</p><p>, we have that one possible expression for J1 is</p><p>which, compared to <ref type="bibr">(10)</ref>, now has a &#948; 2 -inflated sales function s1 (x). Thus, whenever the price x 1 is above the marginal cost m 1 , player 1 now has an incentive to increase his price further in order to increase his payoff (i.e., decrease his cost).</p><p>To compare the trajectories generated by both algorithms (with and without deception), we simulate the system using S d = 100, p = 0.2, and m 1 = m 2 = 30, and the exploration policies &#956;(&#969; i t) = sin(&#969; i t), for i &#8712; {1, 2}. To control &#948; 2 , company 2 makes use of the following payoff-based deceptive dynamics <ref type="bibr">(7)</ref> based on a simple integrator with state</p><p>where J ref 2 is a desired profit value for the company. The left plot in Fig. <ref type="figure">2</ref> shows the trajectories x i with (solid) and without (dashed) deception. As expected, when all companies implement the nominal NE-seeking dynamics (1), the prices x converge to a neighborhood of the NE x * . The resulting profit functions are also shown in the center plot, converging to J i (x * ), for i &#8712; 1, 2. However, note that when company 2 is deceptive, the system's trajectories converge to a neighborhood of a different action x &#948; &#8712; R 2 , namely, the DNE of the deceptive game ( J1 , J 2 ), which, additionally, satisfies J 2 (x &#948; ) = J ref , and which is given by</p><p>Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on December 05,2025 at 23:04:58 UTC from IEEE Xplore. Restrictions apply. The profit J ref is attained by company 2 using &#948; 2 = &#948; * = 0.7929, shown in the right plot of Fig. <ref type="figure">2</ref>. In this way, by controlling &#948; 2 , company 2 is able to induce a different NE to attain its desired profits. In particular, company 2 deceives company 1 into overpricing relative to his Nash price, resulting in extra profit for the deceiver and reduced profit for company 1. The deception consists in making the oblivious player (i.e., company 1) believe that his sales are higher than they actually are, a belief that is reflected in <ref type="bibr">(14)</ref>. As we will show in the next sections (see Remark 4), this type of stable deception is possible for any J ref 2 &gt; 0. The oscillatory transient behavior in the figure can be avoided with phase-lead compensation, as shown in Section IV-D. Note that the dynamics (1), <ref type="bibr">(6)</ref>, and ( <ref type="formula">15</ref>) are all payoff-based and do not require knowledge of the mathematical form of J i . Moreover, in this case, it can be verified that x &#948; is also a true NE of the deceptive game ( J1 , J 2 ) when &#948; 2 = 0.7929.</p><p>Remark 2: An alternative interpretation of the deceptive mechanism can be formulated based on adaptive incentives and tolls <ref type="bibr">[30]</ref>, <ref type="bibr">[31]</ref>. In this interpretation, we write the deceptive profit as J1 (x(t)) = J 1 (x(t)) + &#952;(t), where &#952;(t) = &#948; 2 (t)</p><p>dy is a state-dependent incentive (or tax). Company 2 uses the deceptive NE-seeking dynamics to "indirectly incentivize" company 1 to choose a different price that results in higher profits for company 2. While adaptive incentives have been studied in economics <ref type="bibr">[29]</ref> and congestion problems <ref type="bibr">[30]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>, <ref type="bibr">[34]</ref>, we are not aware of model-free algorithms able to induce incentives/tolls via deception.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Main Result for General Games: Stable Deception</head><p>We now generalize the previous discussions to N -player games with multiple deceptive and oblivious players. Consider an N -player noncooperative game with n &#8804; N deceptive players. Since we can always assign any reordering to the players, without loss of generality, we assume that the set of deceptive players is given by D * = {1, . . ., n} = [n], where each player i &#8712; [n] deceives a subset of players D i := {d i,1 , . . ., d i,n i }, where n i &#8712; [N -1]. To simplify our presentation, we focus on exploration policies that use sinusoidal functions, i.e., &#956;(&#8226;) = sin(&#8226;). However, other continuous periodic functions with suitable 0-average and orthogonality properties can be used in the exploration policies.</p><p>The NES dynamics can be written in a compact form as</p><p>where, for simplicity, we omit writing the explicit time dependence of x i and u i . To control the parameters &#948; i , we first focus on deceptive dynamics based on integral action &#948;i = &#949;&#949; i J i (x) -</p><p>To study the stability properties of ( <ref type="formula">17</ref>) and ( <ref type="formula">18</ref>), we define the deceptive game operator</p><p>where &#948; = [&#948; 1 , . . ., &#948; n ] and</p><p>We define the set</p><p>where D &#361;&#947; is the Jacobian of &#947; with respect to its first argument. In words, the set &#916; characterizes the values of &#948; for which the negative pseudogradient of a deceptive game has a unique DNE. The next lemma follows directly from the implicit function theorem [35, Thm. 9.28], and it states an important property on how the DNE u * depends on &#948;. Lemma 1: Under Assumption 2, the set &#916; is nonempty and open, and there exists g &#8712; C 1 (&#916;, R N ) such that &#947;(g(&#948;), &#948;) = 0, for all &#948; &#8712; &#916;.</p><p>Not every possible value of J ref i may be attainable by a deceptive player under the dynamics <ref type="bibr">(18)</ref>. To characterize the set of values that can be attained, we introduce the following definition, which makes use of the set &#916; in <ref type="bibr">(21)</ref> and the function g from Lemma 1. Definition 3:</p><p>is said to be attainable if there exists &#948; * &#8712; &#916; such that the following holds.</p><p>1</p><p>2) The matrix &#8711; j &#958; i (&#948; * ) &#8712; R n&#215;n is Hurwitz, where &#958; i :</p><p>R n &#8594; R is given by &#958; i (&#948;) := &#949; i J i (g(&#948;)). We let &#937; &#8834; R n denote the set of all attainable vectors</p><p>. Remark 3: Attainability is in general a difficult condition to verify a priori, which is why we typically employ numerical methods to verify it. Fortunately, as we will see in Section IV-B, the conditions are relatively simple to check for special cases of quadratic games.</p><p>The following general theorem is the first main result of this article. It characterizes the stability properties of the NE seeking dynamics with deception.</p><p>Theorem 1: Consider the NE seeking dynamics ( <ref type="formula">17</ref>) and ( <ref type="formula">18</ref>) with J ref &#8712; &#937;, namely, with J ref attainable, as defined in Definition 3. Suppose that Assumptions 1 and 2 hold. Then, there exists &#949; * such that for all &#949; &#8712; (0, &#949; * ) there exists a * such that for all a &#8712; (0, a * ) there exists &#969; * such that for all &#969; &gt; &#969; * the state</p><p>is sufficiently small, where u * is the DNE.</p><p>Proof: To analyze the system, let &#956;(t) = 1 a (xu), where x and u are given in <ref type="bibr">(17)</ref>. Consider the time scale transformation &#964; = &#969;t, and denote &#956;(&#964; ) = &#956;(&#964; /&#969;). With standard averaging theory for Lipschitz ODEs <ref type="bibr">[36]</ref>, we can compute the average dynamics of system <ref type="bibr">(17)</ref>, whose state we denote as</p><p>= -2k &#969;T</p><p>where &#8467; is a point on the line segment connecting the points &#361; and &#361; + a&#956;(&#964; ). The summation in the second equality uses multi-index notation, and we used Assumption 1 to evaluate the integrals. Let &#948; := [&#948; 1 , . . ., &#948; n ] , and note that the average dynamics in vector form in the original time scale are given by</p><p>with &#923;(&#361;) given by <ref type="bibr">(19)</ref>. Averaging also the dynamics of &#948;, we obtain the average system</p><p>and using a Taylor series approximation in <ref type="bibr">(25)</ref>, we get</p><p>where l is a point on the line segment connecting the points &#361; and &#361; + a&#956;(&#964; ). Thus, the overall average dynamics of ( <ref type="formula">17</ref>) and <ref type="bibr">(18)</ref> are</p><p>Denoting J * (&#361;) = [J 1 (&#361;), . . ., J n (&#361;)] , we have</p><p>Using &#964; * = &#964; &#949; in (28), we get</p><p>) If we disregard the O(a) perturbation, the resulting system is in standard singular perturbation form, which, by Lemma 1, has a quasi-steady state &#361; * = g( &#948;). The reduced system is given by &#8706; &#948;</p><p>Since J ref &#8712; &#937;, (30) has an exponentially stable equilibrium &#948; * &#8712; &#916;. Furthermore, we can also let y := &#361;g( &#948;) and obtain the boundary layer system from ( <ref type="formula">29</ref>)</p><p>where the origin is exponentially stable uniformly in &#948; &#8712; B r * (&#948; * ) for some r * &gt; 0. Thus, using a standard singular perturbation argument <ref type="bibr">[36,</ref><ref type="bibr">Ch.11.4]</ref> (see <ref type="bibr">[42]</ref> for details) and letting u * = g(&#948; * ), we can find &#949; * &gt; 0 such that for &#949; &#8712; (0, &#949; * ), [u * &#948; * ] := &#950; * is an exponentially stable equilibrium of the unperturbed system <ref type="bibr">(28)</ref>. By standard robustness results for systems with small additive perturbations, we can find a * &gt; 0 such that for a &#8712; (0, a * ), &#950; converges exponentially to a O(a)-neighborhood of &#950; * provided that |&#950;(0)&#950; * | is sufficiently small. By averaging <ref type="bibr">[36,</ref><ref type="bibr">Thm. 10</ref>.4], we prove the claim for &#969; sufficiently large. Remark 4: As in standard NE-seeking algorithms <ref type="bibr">[11]</ref>, the DNE-seeking dynamics can be enhanced by incorporating a phase &#966; i into the exploration dither of every player, such that sin(&#969; i t) in <ref type="bibr">(17)</ref> becomes sin(&#969; i t + &#966; i ). In this case, if player d is deceiving player i, we let &#966; d,i denote player d's "estimate" of &#966; i (i.e., replace sin(d i,j t) in (17a) with sin(d i,j t + &#966; i,d i,j )). Thus, &#8711; j J i (&#361;) in ( <ref type="formula">20</ref>) can be replaced with cos(&#966; j,i&#966; i )&#8711; j J i (&#361;), and with this modification, all our results hold, i.e., as long as player d's knowledge of &#966; i is not off by an odd multiple of &#960; 2 , the effect of deception persist.</p><p>While Theorem 1 is quite general and relies only on Assumptions 1 and 2, it does not characterize the set of attainable costs for deceptive players, or how this set relates to the stabilitypreserving set &#916; in <ref type="bibr">(21)</ref>. We address these questions in the next sections by leveraging additional structures on some classes of common games studied in NE seeking problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. DNE SEEKING IN QUADRATIC GAMES</head><p>In this section, we focus on games with costs of the form</p><p>with Q i &#8712; R N &#215;N being symmetric &#8704;i, b i &#8712; R N , and p i &#8712; R. In this case, the pseudogradient of the game is</p><p>where Q &#8712; R N &#215;N and B &#8712; R N &#215;1 have the form</p><p>. . .</p><p>When -kQ is Hurwitz with a negative diagonal, Assumption 2 holds and the NE of the game can be directly computed as x * := -Q -1 B. In this case, we can obtain an exact characterization of the average dynamics using the expansion</p><p>In particular, note that under Assumption 1, we have 1</p><p>is the sum of terms that are odd and periodic in &#964; , which implies that O(a) = 0 in (27a).</p><p>To study deception in quadratic games, we introduce the matrices Q(i, j) &#8712; R N &#215;N , B(i, j) &#8712; R N given by</p><p>which allow us to write the average dynamics <ref type="bibr">(24)</ref> as</p><p>where</p><p>Similarly, the average dynamics of &#948; can be computed as</p><p>where P i is now a quadratic function. Therefore, the set &#916; in (21) can be equivalently written as &#916; = {&#948; &#8712; R n : -kQ &#948; is Hurwitz} and the &#948;-dependent equilibrium point of ( <ref type="formula">35</ref>) is given by</p><p>and where g comes from Lemma 1. Note that, while the set &#916; is nontrivial to compute, Theorem 1 guarantees that &#916; contains a neighborhood of 0, and this neighborhood can be used to obtain a conservative estimate of &#937;.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Deception as &#948;-Rotations and &#948;-Translations of Reaction Curves</head><p>To better understand the effect of deception on quadratic games, we can study the expression in <ref type="bibr">(35)</ref>. Since for Nplayer quadratic games, each player's reaction curve is an N -dimensional affine hyperplane, and it can be seen that the deception mechanism in ( <ref type="formula">17</ref>) and ( <ref type="formula">18</ref>) effectively "adds" additional hyperplanes to the reaction curve of the deceived player. These hyperplanes are precisely the externalities that other players' actions have on the cost of the deceived player.</p><p>In particular, recall that the nominal reaction curve for player k is given by N</p><p>However, if player k is being deceived by player j, the reaction curve of player k in the deceptive game satisfies</p><p>However, since the set</p><p>does not depend on &#948; j whenever &#948; j = 0, we can deduce that ( <ref type="formula">36</ref>) is a rotation of player k's reaction curve around the N -1 dimensional hyperplane</p><p>then the added hyperplane is parallel to the reaction curve of player k, and hence, (36) is a translation of player k's reaction curve. The following examples illustrate these ideas.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 2 (Rotation of reaction curves via deception):</head><p>In the duopoly game of Example 1, the profits of the companies have the form <ref type="bibr">(32)</ref> with</p><p>When company 2 implements the deceptive NE-seeking dynamics <ref type="bibr">(17)</ref>, we can use the preceding observations to verify that companies 1's perceived reaction curve is a rotated version of the original reaction curve around the point <ref type="bibr">[30,</ref><ref type="bibr">10]</ref> . This rotation, for different values of &#948; 2 , is illustrated in the left plot of Fig. <ref type="figure">3</ref>, where we also show the reaction curve of company 2 (in color black) and its isoprofit functions (in color gray).</p><p>Example 3 (Translation of reaction curves via deception): Consider now a quadratic game with matrices Q 1 =  where player 2 is still deceiving and player 1 is still oblivious. In this game, Q 1 is singular and N (&#8711; 1 J 1 (x)) &#8745; N (&#8711; 2 J 2 (x)) = &#8709;. Therefore, we can conclude that the "transformation" induced on player 1's reaction curve via deception by player 2 is in fact a translation. This is visualized in the right plot of Fig. <ref type="figure">3</ref>.</p><p>We have observed cases where deception can rotate or translate the reaction curve of the oblivious player, but it is important to note that these two phenomena cannot occur simultaneously. An important implication of the previous discussion is that the reaction curve of player d i is unaffected by deception if</p><p>). This property, detailed in the following lemma, characterizes a class of games that are intrinsically "deception-immune" under <ref type="bibr">(17)</ref> and Lemma 2 (Deceptive-immune games): Consider an N -player quadratic game, and suppose that only players in K i are deceptive to player i. If &#948; &#8712; &#916;, then the following condition:</p><p>implies</p><p>Proof: Following the computations in <ref type="bibr">(36)</ref>, condition (38) implies that the reaction curve for player i satisfies</p><p>where the last equality follows from the fact that &#948; &#8712; &#916; implies</p><p>Hence, we prove the result. Although condition <ref type="bibr">(38)</ref> "immunizes" player i's reaction curve to deception, player i can still be indirectly affected if other players are deceived. However, if all deceived players' objective functions satisfy <ref type="bibr">(38)</ref>, the game is immune to deception. Indeed, a sufficient condition for a fully deception-immune game is that rank Therefore, this game is immune to deception under the DNEseeking dynamics <ref type="bibr">(6)</ref>. Note that for the duopoly game studied in Example 1, the matrices Q 1 and Q 2 are invertible for any p, and there are no choice of parameters that can immunize the companies to deception.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Single-Deceiver Single-Oblivious (SDSO) Player Case</head><p>For games with one deceiver and one oblivious player, it is possible to provide a more detailed characterization of the deceiving properties of the DNE-seeking dynamics <ref type="bibr">(17)</ref> and <ref type="bibr">(18)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Characterization of Attainable Costs:</head><p>In N -player games with one deceiver and one oblivious player, we can precisely determine how J i (x &#948; ) changes with &#948;, and whether the deceptive dynamics <ref type="bibr">(18)</ref> allow deceivers to achieve any desired value J ref i . The following lemma is a first step in this direction. Lemma 3: Consider an N -player game with quadratic costs of the form <ref type="bibr">(32)</ref>, and let</p><p>where J i is a quadratic polynomial and f : R \ {-</p><p>with q 1 , q 2 , q 3 &#8712; R. If q 3 = 0, then f is a bijection. Proof: We will assume without loss of generality that player 1 is a deceiving player d, so </p><p>Solving for e &#948;,2: and plugging into e &#948; yield the following:</p><p>This formulation, of course, only makes sense if [Q] d,1 is invertible, which is another assumption we will make. Indeed, even if [Q] d,1 is singular, given that -kQ is Hurwitz (which implies invertibility), we can always find some i such that [Q] d,i is invertible. Then, we can select e &#948;,1 to be the ith entry of e &#948; and partition e &#948; appropriately. Without loss of generality, we will make the assumption that the entry i = 1 satisfies this property.</p><p>Inspecting the d row of <ref type="bibr">(40)</ref> to solve for e &#948;,1 , we obtain</p><p>where</p><p>Using (43), we directly obtain <ref type="formula">42</ref>) into this expression leads to</p><p>where</p><p>which establishes the result. By leveraging Lemma 3, we can now characterize the structure of the set of attainable costs &#937; for SDSO quadratic games.</p><p>Theorem 2: Consider an N -player quadratic SDSO game with cost functions <ref type="bibr">(32)</ref>, D * = {1}, D 1 = {d}, and let J 1 be given by (44) and f be given by <ref type="bibr">(39)</ref>.</p><p>1) If &#949; 1 r 1,2 q 1 q 3 &gt; 0, then</p><p>2) If &#949; 1 r 1,2 q 1 q 3 &lt; 0, then</p><p>Proof: Since there is only one deceptive player, we have &#948; = &#948; 1 . Using the definition of &#958; in Definition 3, and the result from Lemma 3, we obtain that</p><p>where r i,j is defined in (45). The result follows now directly from Definition 3. Remark 5: Using these results, we obtain that for the duopoly of Example 1, &#916; = (-&#8734;, 1.5) and &#937; = (0, &#8734;). However, as J ref 2 &#8594; &#8734;, we have &#948; * &#8594; 1.5. So, as the deceiver gets greedier, the basin of attraction of the DNE shrinks.</p><p>Example 5 (On the geometry of attainable costs): Consider a two-player quadratic game with cost parameters We can use our previous theoretical results to compute the key properties of the emerging DNE. Specifically, using <ref type="bibr">(34)</ref>, we obtain Q = [1, 5; 0, 0] and B = [2; 0]. Similarly, &#916; = (-7, 5  3 ), and using <ref type="bibr">(42)</ref>, we obtain e &#948; = e &#948;,1 &#934;, where &#934; = [1 -0.5] and e &#948;,1 = 4&#948;(-1.5&#948; + 2.5) -1 . It follows that player 2 can rotate the reaction curve of player 1 around the point -Q -1</p><p>1 b 1 = [-1.286; -0.143]. Lastly, we have J 2 (e &#948;,1 ) = 3e 2 &#948;,1 -8e &#948;,1 + 0.5, which achieves a minimum of -4.83 at e * &#948;,1 = 4 3 &#8594; &#948; = 5 9 . Using Theorem 2, we can compute the range of attainable J 2 values, which is &#937; = (-4.83, 31.64).</p><p>In words, the result of Theorem 2 characterizes the level of profits that players can achieve via deception in quadratic games when there is only one deceiver and one oblivious player. Since, in practice, the parameters of the game are unknown, this characterization is only relevant to establish viability for deception. Note that if J ref is not moderate, one might have</p><p>&#8712; &#937;, and then instability can emerge, i.e., greedy deception might lead to instability.</p><p>2) Benevolent Deception: In the duopoly example, it can be verified that setting J ref 2 = 1000 shifted the NE to a position more favorable for company 2 but less favorable for company 1. However, in some cases, deception can also benefit oblivious players. This setting is known in the literature as benevolent deception <ref type="bibr">[37]</ref>, and it can also emerge under the DNE-seeking dynamics ( <ref type="formula">17</ref>) and ( <ref type="formula">18</ref>) with an appropriate choice of J ref i in <ref type="bibr">(18)</ref>.</p><p>To study benevolent deception, note that the cost functions in the duopoly game, evaluated at the DNE, can be written as follows using (43)</p><p>&#948;,1 + 33.3e &#948;,1 + J 2 (x * ). To improve both payoff functions, &#948; must satisfy &#948; &#8712; {&#948; : -2.5e 2 &#948;,1 + 33.3e &#948;,1 &gt; 0, -1.25e 2 &#948;,1 + 33.3e &#948;,1 &gt; 0} = 0, 0.75) which is consistent with the fact that in Example 1, the desired value J ref 2 = 1000 was achieved with &#948; = 0.7929 &#8712; (0, 0.75). To characterize the values of J ref 2 that lead to benevolent deception, we can compute the set J 2 (g((0, 0.75))) = (222.2, 888.8), where g(&#948;) is the DNE (from Lemma 1). It follows that for any J ref 2 &#8712; (222.2, 888.8), the seeking dynamics ( <ref type="formula">17</ref>) and ( <ref type="formula">18</ref>) will stabilize a DNE that leads to a better payoff for both players. These ideas can be generalized to an N -player setting. Using (44), let F i (e &#948;,1 ) := J i (e &#948;,1 ) -J i (x * ) = r i,2 e 2 &#948;,1 + r i,1 e &#948;,1 and note that F i (0) = r i,1 . The following theorem establishes benevolent DNE seeking whenever player 1 is the deceptive player and player d is the oblivious player.</p><p>Theorem 3: Consider an N-person noncooperative game with cost functions <ref type="bibr">(32)</ref> and players implementing the DNE-seeking dynamics <ref type="bibr">(17)</ref> and <ref type="bibr">(18)</ref>, where n = 1, n 1 = 1, and d 1,1 = d. Let M denote a subset of players with cost functions satisfying sgn (r 1,1 ) = sgn (r i,1 ) &#8704;i &#8712; M.</p><p>Suppose &#949; 1 r 1,1 q 1 q 3 &lt; 0. Then, &#937; is nonempty and there is a nonempty subset &#937; * &#8834; &#937; such that for each </p><p>Proof: First, we assume that r 1,1 &gt; 0, and the r 1,1 &lt; 0 case is nearly identical. This means that F i (0) &gt; 0 for each i &#8712; M &#8746; {1}, so we can find some R &gt; 0 such that F i (e &#948;,1 ) &lt; 0 whenever e &#948;,1 &#8712; (-R, 0), &#8704;i &#8712; M &#8746; {1}. We also have</p><p>which is negative by our assumptions, so we can find R * &gt; 0 such that &#8706;&#958; 1 &#8706;&#948; &lt; 0 for all &#948; &#8712; B R * (0) &#8834; &#916;. We can define the following set:</p><p>which is nonempty since f is continuous and strictly monotone in a neighborhood of 0. We can then let &#937; * = J 1 (E), which completes the proof.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 6 (Benevolent deception in multiplayer games):</head><p>To provide more insight on our results and show their application in larger multiplayer games, we consider a three-player quadratic game with cost functions having parameters defined in the caption of Fig. <ref type="figure">4</ref>. The pseudogradient parameters are given by</p><p>If player 1 is deceptive to player 3, we have</p><p>An application of the Routh-Hurwitz criteria yields &#916; = (-3.315, &#8734;), and from the proof of Lemma 3, we also have &#934; = [1; 1.55; 10.86] and f (&#948;) = -3.6&#948;(1.21&#948; + 4) -1 . We set &#949; 1 &gt; 0, and with Proposition 2, we obtain &#937; = J 1 ((-0.629, &#8734;)) = (1.04, &#8734;). Furthermore, we also have r 1,1 = 68.3, r 2,1 = 42.6 and r 3,1 = 15.8. When &#949; 1 &gt; 0, the conditions of Theorem 3 are satisfied, which guarantees the possibility of benevolent deception. We also have J 1 (x * ) = 22.5, so if player 1 chooses J ref 1 = 5, which is achieved at &#948; = 0.454, it can be verified that each player's cost function improves when the NE is moved to the stable DNE. Moreover, it can also be verified that the DNE is a true NE of the deceptive game. The results are verified and illustrated in Fig. <ref type="figure">4</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Mutual Deception</head><p>In this section, we study an important question that can emerge in systems with more than one deceiver and oblivious player: What happens when two players are deceptive to each other? This situation is refer to in the literature as mutual deception <ref type="bibr">[38]</ref>. This is a special case of the generalized deception considered in Theorem 1, but it raises some interesting questions.</p><p>To study mutual deception in noncooperative games, we focus on two-player games with exploration policies given by</p><p>In this case, the average dynamics of the system are given by ( <ref type="formula">35</ref>) with &#948; = [&#948; 1 &#948; 2 ] and</p><p>We let each player i update &#948; i via the first-order integrator dynamics &#948;i = &#949;&#949; i J i (x) -J ref i where &#949; &gt; 0. In this case, by direct computation, we can observe that the vector</p><p>for which the following three properties hold:</p><p>1) -kQ &#948; * is Hurwitz; 2)</p><p>3) The following matrix is Hurwitz:</p><p>To illustrate these ideas, we revisit the duopoly example when company 1 is also deceptive to company 2.</p><p>Example 7 (Duopoly Game With Mutual Deception): For the duopoly game studied in Example 1, we show that the effect of deception of company 2 on company 1 was reflected on a modified cost J1 with an inflated sales function. Using similar computations, it can be verified that if company 1 is also deceptive to company 2, the perceived cost of company 2</p><p>Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on December 05,2025 at 23:04:58 UTC from IEEE Xplore. Restrictions apply. Fig. <ref type="figure">5</ref>. Left: Profit convergence in a duopoly game under mutual deception using first-order deceptive dynamics. Center: Profit convergence in a duopoly game under (mutual) deception using second-order deceptive dynamics. Right: trajectories of &#948; in a mutually deceptive duopoly with first and second order tuning dynamics on &#948;.</p><p>becomes</p><p>where</p><p>2p . In (49), company 2's perceived cost has now a distorted sales value s 2 that depends on deceiving prices x1 and x2 . Note that the resulting quadratic game can be written as <ref type="bibr">(32)</ref> with</p><p>Suppose the companies implement <ref type="bibr">(18)</ref> with J ref 1 = 1200 and J ref 2 = 1800, leading to &#948; * 1 = 0.459 and &#948; * 2 = 0.848. With &#949; = 0.001, &#949; 1 = 1, and &#949; 2 = 1 2 , the second condition of Definition 3 leads to</p><p>which is Hurwitz. It can also be verified that -kQ &#948; * is Hurwitz. Thus, the solutions of the closed-loop system converge to the DNE. This can be visualized in the left plot of Fig. <ref type="figure">5</ref>, where the DNE-seeking dynamics used a = 0.05, k = -0.03, &#969; 1 = 7877.75, and &#969; 2 = 7436.5. This example also showcases the possibility of benevolent deception and mutual deception occurring simultaneously.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Adding Approximate Proportional to the Integral Action to Dampen Convergence</head><p>The first-order deceptive dynamics <ref type="bibr">(18)</ref> are not the only dynamics of the form (7) that can be studied for the purpose of systematic deception. In particular, we now consider the following deceptive dynamics:</p><p>where G i,2 &gt; G i,1 &gt; 0 are tunable gains. Note that the choice G i,1 = G i,2 reduces (51) to the first-order dynamics <ref type="bibr">(18)</ref>. In essence, the second-order dynamics (51) incorporate a phaselead compensation mechanism that incorporates integral action plus an approximate proportional term to improve transient performance. This improvement can be observed in the center plot of Fig. <ref type="figure">5</ref>, where we simulated the duopoly game with deception using (51) instead of the integral deception mechanism <ref type="bibr">(18)</ref>. The following theorem formalizes the stability properties of the deceptive NE-seeking dynamics <ref type="bibr">(17)</ref> with second-order deceptive dynamics (51). Theorem 4: Suppose that Assumption 1 holds, and consider the DNE-seeking dynamics ( <ref type="formula">17</ref>) and (51) with J ref &#8712; &#937; and J i satisfying Assumption 2 for all i &#8712; [N ]. Then, there exists &#949; * such that for all &#949; &#8712; (0, &#949; * ) there exists a * such that for all a &#8712; (0, a * ) there exists &#969; * such that for all &#969; &gt; &#969; Proof: After obtaining the average system with &#964; = &#969;t, we can apply the time scale transformation &#964; * = &#949;&#964; to obtain </p><p>If J ref &#8712; &#937;, then (53) has an exponentially stable equilibrium point * &#8712; &#916;. Now, setting y = zz * , we obtain the boundary layer system</p><p>where &#923; * is a n &#215; n diagonal matrix with the (i, i) entry being 1 -G i,2/ G i,1 . We see that (54) is a cascade, and y 2 = 0 is an exponentially stable equilibrium for (54b). Substituting into (54a) gives</p><p>which is exponentially stable at y 1 = 0 uniformly in &#732; &#8712; B R * ( * ) for some R * &gt; 0. At this stage, the system reduces to the system in Theorem 1 but with &#950; * = [g( * ), * , * ] .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. DNE SEEKING IN AGGREGATIVE GAMES</head><p>We now study deception in games that admit non-quadratic cost functions J i . Specifically, we now consider games where each player's objective function is of the following form:</p><p>where c i : R &#8594; R is twice differentiable and strongly convex, and l i : R N -1 &#8594; R is linear. These types of games are often referred to as aggregative games, and they are also common in the literature of NE seeking problems in noncooperative games, e.g., <ref type="bibr">[39]</ref>, <ref type="bibr">[40]</ref>, and <ref type="bibr">[41]</ref>.</p><p>In aggregative games, the ith entry of the pseudogradient is given by</p><p>where l i can be written as</p><p>For convention, we will use &#945; k,k = 0 for k &#8712; [N ]. Under suitable conditions on l i , aggregative games are strongly monotone (i.e, G is a strongly monotone operator). To see this, let &#954; i &gt; 0 denote the strong convexity parameter of c i . Setting z := xy, we obtain</p><p>, it is easy to see that the aggregative game is strongly monotone with &#954; &#8712; (0, min j K j ) provided K j &gt; 0&#8704;j. Therefore, we make the following assumption on the game.</p><p>Assumption 3: The N-player aggregative game with cost functions (55) is &#954;-strongly monotone.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Averaging Analysis and &#948;-Transformations of Reaction Curves</head><p>Let each player i &#8712; [N ] choose its action according to the strategy (1), except for player d, who we assume is deceptive and therefore implements (6a) to deceive n d players in the set D = {d 1 , . . ., d n d }. Using the same steps as in Section IV, under Assumption 1, the average dynamics of u are given by u = -kG(&#361;) -k&#948;&#936;(&#361;) + O(a)</p><p>where &#936;(u) &#8712; R N and where the d i th entry of &#936;(u) is now given by &#8711;</p><p>while all the other entries are equal to 0. As before, we first disregard the O(a) perturbation term in (58) to proceed with our analysis. We obtain the nominal dynamics u = -k&#947;(&#361;, &#948;), where &#947;(&#361;, &#948;) := G(&#361;) + &#948;&#923;&#361;, with &#923; being a diagonal matrix with diagonal entries equal to zero, except for the (d i , d i ) entries, which are equal to &#945; d i ,d . Note that the change that results from deception in the average dynamics corresponds to the added term -k&#948;&#923;&#361;.</p><p>In particular, unlike the quadratic case, the effect of deception in the reaction curves of nonlinear aggregative games is not any more a rotation or translation, but rather a complex nonlinear transformation that can even induce multiple critical points for the pseudogradient of the deceptive game. This behavior is illustrated in Fig. <ref type="figure">6</ref> for a two-player aggregative game with cost functions</p><p>where player 1 is deceptive to player 2.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Preserving Monotonicity Under Deception:</head><p>To study the monotonicity properties of the induced deceptive aggregative game, we first consider the case where &#948; is fixed. Using again z := xy, the mapping &#947; satisfies</p><p>Since, by Assumption 3, the nominal game is strongly monotone, to guarantee strong monotonicity of the deceptive game, we just need to guarantee the condition K i + &#948;&#945; i,d &gt; 0 &#8704;i &#8712; D. To study the set of points &#948; that satisfy this condition, which we now denote &#916;, without loss of generality, we can assume that &#945; i,d = 0&#8704; i &#8712; D, since otherwise, the 0 cases will have no effect on such set. By direct computation, we obtain the following. Lemma 4: For an N -player strongly monotone aggregative game where D * = {d} and D = {d 1 , . . ., d n d }, the following estimate of &#916; holds:</p><p>where &#948; -and &#948; + are given by</p><p>The conditions of Lemma 4 provide some (conservative) estimates on the values &#948; the deceiver can use to influence the game without loosing the strong monotonicity property.</p><p>2) When Can the Deceptive Player Benefit?: Since the main goal of deception is to obtain better profits (when possible), we now address the following question: When can a player in an aggregative game use deception to improve their own cost while maintaining closed-loop stability? To address this question, fix &#948; * &#8712; &#916; and pick x * such that &#947;(x * , &#948; * ) = 0. Computing the Jacobian of &#947; with respect to x, we obtain</p><p>where the matrix &#926; takes the form</p><p>Since &#947;(&#8226;, &#948;) is strongly monotone for all &#948; &#8712; &#916;, it follows that D x &#947;(x * , &#948; * ) is invertible. Then, using Lemma 1, we can obtain a function g &#8712; C 1 (&#916;, R N ) which, by the implicit function theorem, also satisfies</p><p>where g is understood to be componentwise differentiation with respect to its argument. Let x * denote the nominal NE of the averaged game dynamics (disregarding the O(a)-perturbations) without deception, i.e., when &#948; = 0, and let &#926; * := &#926;(x * ). By inspecting (62), we can derive the following result, which guarantees DNE seeking with the deceptive player d always improving its payoff.  then there is a nonempty subset &#937; * &#8834; &#937; such that for each J ref d &#8712; &#937; * , it follows that J d (u * ) &lt; J d (x * ), where &#950; * = [u * &#948; * ] is the point generated by Theorem 1. Proof: We know that x * satisfies G d (x * ) = c d (x * d ) + l d (x * -d ) = 0 &#8594; l d (x * -d ) = -c d (x * d ). Then, using (55), we obtain J d (x * ) = c d (x * d ) + l d (x * -d )x * d = c d (x * d )c d (x * d )x * d =: Jd (x * d )</p><p>where Jd : R &#8594; R. We can differentiate to extract some information about the behavior of (63) In particular, this tells us that no matter the value of x * d , there is a "direction" such that J d decreases when x * d is moved in that "direction." Since player d seeks to minimize J d , it is most desirable to find &#948; &#8712; &#916; such that |x * d | is maximized. As we have demonstrated above, we can use the implicit function theorem to obtain g &#8712; C 1 (&#916;, R N ) such that &#361; = g(&#948;) is the DNE of the unperturbed (58) for &#948; &#8712; &#916; and g (0) = -(&#926; * ) -1 &#923;x * . Since &#949; d ((&#926; * ) -1 &#923;x * ) d x * d &gt; 0, we know that g d (0) = 0, so by observing (63) and (64), we know that there is some open interval E 1 &#8834; &#916; such that J d (g(&#948;)) &lt; J d (x * )&#8704;&#948; &#8712; E 1 , where E 1 is either of the form (-R, 0) or (0, R) for some R &gt; 0. Furthermore, by continuity we can also find an open neighborhood E 2 of 0 such that g d (&#948;)g d (&#948;) &gt; 0&#8704;&#948; &#8712; E 2 . Let E := E 1 &#8745; E 2 , which is also an open interval. Then, we set &#937; * := (inf &#948;&#8712;E J d (g(&#948;)), J d (x * )), which completes the proof.</p><p>From the proof of Theorem 5, the expression in (64) can be used to establish the following Corollary for two player games.</p><p>Corollary 1: Consider a two-player strongly monotone aggregative game where player 1 is deceptive to player 2. Let (&#948;, &#948;) &#8834; &#916; such that 0 &#8712; (&#948;, &#948;), where &#948;, &#948; are given by (61). Furthermore, assume x * 2 = 0. Then, at least one of the following holds:</p><p>1) J 1 (g(&#948;)) is strictly decreasing on (0, &#948;);</p><p>2) J 1 (g(&#948;)) is strictly increasing on (&#948;, 0); where g(&#948;) = x &#948; is obtained from Lemma 1. Proof: In this setting, we have &#947;(x, &#948;) = c 1 (x 1 ) + &#945; 1,2 x 2 c 2 (x 2 ) + &#945; 2,1 x 1 + &#948;&#945; 2,1 x 2 .</p><p>(65)</p><p>It follows that g 2 (&#948;) is injective. Indeed, suppose there exists &#948; a , &#948; b &#8712; &#916; with &#948; a = &#948; b and g 2 (&#948; a ) = g 2 (&#948; b ). Then, the top entry of (65) implies g(&#948; a ) = g(&#948; b ), and the bottom entry gives (&#948; a&#948; b )&#945; 2,1 g 2 (&#948; a ) = 0, which implies g 2 (&#948; a ) = 0. But this means g(&#948; a ) = x * is also an NE of the nondeceptive game, which contradicts the assumption that the NE is unique. Then, g 2 (&#948;) is strictly monotone on &#916;, and since the top entry of ( <ref type="formula">65</ref>) is a strictly monotone curve in the R 2 plane, we also conclude that g 1 (&#948;) is strictly monotone on &#916;. By combining this with the observation from (64) (but replace x * with we the result. This result suggests that one viable strategy for the deceiver in a two-player aggregative game is to tune &#948; monotonically. In particular, if the conditions of Theorem 5 are satisfied and the deceiver is able to improve their cost by perturbing &#948; in some "direction" away from 0, then they can continue tuning &#948; &#8712; &#916; in that same "direction."</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Numerical Example</head><p>Consider a two-player aggregative game with cost given by (59).It is straightforward to verify that &#954; 1 = &#954; 2 = 2, &#945; 1,2 = 2, and &#945; 2,1 = 1.1. It can also be verified that this game is strongly monotone since &#954; j -|&#945; 2,1 +&#945; 1,2 | 2 = 2 -|3.1| 2 &gt; 0. Suppose player 2 is deceptive to player 1. Then, we can compute G and &#947; as follows:</p><p>3 1 + 2x 1 + 2x 2 e x 2 + 2x 2 + 1.1x 1 , &#947; = 4x 3 1 + 2x 1 + 2x 2 + 2&#948;x 1 e x 2 + 2x 2 + 1.1x 1</p><p>and we also have that &#916; = (-0.225, &#8734;). Note that setting the second component of &#947; equal to 0 results in a monotonic curve in the R 2 plane. That is, as x 2 increases, we must have that x 1 decreases. This relationship would of course be reversed if &#945; 2,1 were negative. Fig. <ref type="figure">6</ref> presents a visualization for how deception affects the reaction curve for player 1in this setting. An interesting observation is that, similar to quadratic games, all the reaction curves of the oblivious player intersect at a single point. However, in the aggregative game considered in this example, the "transformation" induced by dynamic deception on the reaction curves is not anymore a rotation. The shared point in this example is the point on player 1's reaction curve where x 1 = 0, which is 0. In this case, player 2 sets J ref 2 = 0.605, achieved at &#948; = -0.22 &#8712; &#916;. The convergence of i and J 2 are visualized in Fig. <ref type="figure">7</ref>. Additional analytical and numerical examples can be found in <ref type="bibr">[42]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION</head><p>We introduced the problem of model-free NE-seeking with deception for noncooperative games with finitely many players. The setting considered incorporates oblivious and deceiving players, and studies the stable behaviors that emerge under a class of model-free (or payoff-based) algorithms that rely on simultaneous exploration and exploitation policies implemented in games with asymmetric information. The geometrical and structural properties of the induced deceptive games and deceptive Nash equilibria were studied, as well as the stability properties of the dynamics under benevolent deception, mutual deception, and using high-order deceptive dynamics. Our results open the door to several new research questions at the intersection of adaptive and learning systems, and deception in game theory.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on December 05,2025 at 23:04:58 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
