<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Follower Agnostic Learning in Stackelberg Games</title></titleStmt>
			<publicationStmt>
				<publisher>IEEE</publisher>
				<date>12/16/2024</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10575537</idno>
					<idno type="doi">10.1109/CDC56724.2024.10885984</idno>
					
					<author>Chinmay Maheshwari</author><author>James Cheng</author><author>Shankar Sastry</author><author>Lillian Ratliff</author><author>Eric Mazumdar</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this paper, we present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Unlike previous works, our approach works even when leader has no knowledge about the followers' utility functions, strategy space or learning algorithm. Our algorithm introduces a unique gradient estimator, leveraging specially designed strategies to probe followers. In a departure from traditional assumptions of optimal play, we model followers' responses using a convergent adaptation rule, allowing for realistic and dynamic interactions. The leader constructs the gradient estimator solely based on observations of followers' actions. We provide both non-asymptotic convergence rates to stationary points of the leader's objective and demonstrate asymptotic convergence to a local Stackelberg equilibrium.To validate the effectiveness of our algorithm, we use this algorithm to solve the problem of incentive design on a largescale transportation network, showcasing its robustness even when the leader lacks access to followers' demand information.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Stackelberg games encompass a wide range of practical problems including incentive design, Bayesian persuasion, inverse optimization, bilevel optimization, cybersecurity, adversarial learning, to name a few. Stackelberg games are comprised of two type of players -leader and followers 1 . Mathematically, they are represented as follows: where X is the leader's strategy set, Y &#8838; R d is the followers' (joint) strategy set, f : X &#215; Y -&#8594; R is the utility of the leader, G : X &#215; Y -&#8594; R d is the game Jacobian of followers and SOL(Y, G(x, &#8226;)) is a variational inequality problem that denotes the equilibrium response of followers, given the strategy of leader be x &#8712; X. Assuming that the set S(x) is singleton for every x &#8712; X (commonly referred as lower-level singleton assumption), (I.1) is equivalent to optimizing the following hyper-objective:</p><p>Note that in general (I.2) is non-convex optimization problem. Thus, the goal in Stackelberg games is to find a stationary point / local optima of (I.2) ( <ref type="bibr">[2]</ref>).</p><p>In numerous practical scenarios, it is unrealistic to presume that the leader possesses any information regarding the variational inequality problem at the lower-level, including the mapping G(x, &#8226;) and even their strategy set Y -information traditionally assumed in prior research on solving Stackelberg games. Thus, the key question we ask in this work is:</p><p>Q: Can we design efficient algorithms for Stackelberg games where the leader does not require any explicit knowledge of the game played between followers? In this work, we affirmatively answer the above question in the setting where the leader can only probe the followers with different strategies and receive estimates of their (approximated) equilibrium responses. This is in contrast to the common assumption in the literature on Stackelberg games, where it is assumed that the leader has access to an equilibrium or best-response of followers either by knowledge of the utility function of followers or through an oracle. In particular, we consider that followers are rational in the sense that they employ an adaptation/learning algorithm, which asymptotically converges to the equilibrium <ref type="bibr">[3]</ref>.</p><p>We propose a two-loop algorithm where, in the outer loop, the leader fixes its strategy (i.e., the value of x) and announces it to the followers. Between two updates of the leader's strategy, the followers employ an adaptation algorithm, for a finite number of steps, so that they converge to an approximate equilibrium (or best-response). Upon observing the followers' behavior, the leader constructs an approximate estimator of the gradient of the hyper-objective (I.2) and updates its strategy via gradient descent using the estimator.</p><p>We show that the proposed algorithm converges to a stationary point of (I.2) at a rate O(T -1/2 ). Moreover, we show that if the hyper-objective satisfies the strict-saddle property, i.e. the minimum eigenvalue at any saddle point is strictly negative, then the iterates asymptotically avoid saddle points (which include local maxima) and converge to a local minima of the hyper-objective (aka local Stackelberg equilibrium <ref type="bibr">[2]</ref>).</p><p>We corroborate the theoretical results by conducting a simulated study of the proposed algorithm to design tolls over the Sioux Falls (South Dakota, US) transportation network. In this setup, we assume that the leader does not know the origin-destination (o-d) demand of travelers moving between different o-d pairs, which is sensitive information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Related works</head><p>Learning in Stackelberg games: Learning in Stackelberg games with finite actions is an active area of research ( [4]- <ref type="bibr">[8]</ref>), where the leader has access to either a noisy or exact best response oracle. Furthermore, a dominant paradigm in this literature is to consider two-player games with finite strategy sets or linearly parametrized utility functions, with the exception of <ref type="bibr">[2]</ref>, <ref type="bibr">[9]</ref>- <ref type="bibr">[11]</ref>. In <ref type="bibr">[2]</ref>, the authors study the convergence of a two-timescale algorithm to the Stackelberg equilibrium, requiring knowledge of the Hessian of followers' utility functions for leader updates. In <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>, the authors require the followers to follow a specific (i.e. gradient type) learning algorithm in order to ensure convergence. Finally, in <ref type="bibr">[11]</ref>, the authors impose strong convexity assumption on the hyper-objective which is restrictive (as shown in <ref type="bibr">[12]</ref>). In this work, we aim to design followeragnostic learning in a general-sum Stackelberg game in continuous spaces with no knowledge of the followers' utility functions or learning algorithms and not imposing restrictive assumptions about convexity of hyper-objective.</p><p>Bilevel optimzation. Bilevel optimization, a subset of problem (I.1), is extensively studied in literature, resembling a Stackelberg game with a single leader and follower. Existing research on bilevel optimization pursues three main approaches. The first utilizes a value function-based approach, converting the problem into a constrained singlelevel optimization problem with convergence guarantees to approximate Karush-Kuhn-Tucker (KKT) points <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>. However, such points may not capture locally optimal solutions <ref type="bibr">[15]</ref>. Another line of research focuses on asymptotic convergence of solutions of simpler bilevel problems than (I.1) under various assumptions on the lower-level objective function structure <ref type="bibr">[16]</ref>- <ref type="bibr">[18]</ref>. The third line explores solving the non-convex optimization problem (I.2) using gradient descent, requiring the computation of the gradient of the solution mapping, denoted as &#8711;S(x). While many methods exist for approximating &#8711;S(x), including Automatic Implicit Differentiation (AID) ( <ref type="bibr">[19]</ref>- <ref type="bibr">[23]</ref>), or Iterative Differentiation ( <ref type="bibr">[20]</ref>, <ref type="bibr">[24]</ref>- <ref type="bibr">[26]</ref>), our work is closely related to zerothorder methods, specifically avoiding the computation of the Hessian ( <ref type="bibr">[15]</ref>). Our proposed algorithm shares similarities with <ref type="bibr">[15]</ref>, but we eliminate the need for oracle access to a lower-level optimal solution, leveraging two-timescale stochastic approximation to analyze accumulated errors <ref type="bibr">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. PROBLEM FORMULATION</head><p>Consider the following Stackelberg game</p><p>where</p><p>Under mild conditions on the monotonicity of G(x, &#8226;), it is ensured that S(x) is non-empty and convex ( <ref type="bibr">[27]</ref>).</p><p>In what follows, we call a continuously differentiable function f :</p><p>Next, we introduce the main assumptions on the parameters of (SG) made throughtout this paper Assumption 1. (1) For every y &#8712; Y , the function f (&#8226;, y) is L 1 -Lipschitz. Additionally, for every x &#8712; X, the function f (x, &#8226;) is L 2 -Lipschitz and &#8467; 2 -smooth.</p><p>(2) For every x &#8712; X, the set S(x) is singleton and function</p><p>Assumption 1-( <ref type="formula">1</ref>) is a common assumption employed in literature to derive rates of convergence <ref type="bibr">[10]</ref>, <ref type="bibr">[11]</ref>. Assumption 1-( <ref type="formula">2</ref>), which requires that the set S(x) exists, is singleton and Lipscthiz continuous for every x, holds for strongly monotone games at lower level <ref type="bibr">[28]</ref>. Furthermore, it also applies to the incentive design problem in routing games, as discussed in Section III. Assumption 1-( <ref type="formula">3</ref>) is a technical condition we impose on the hyper-objective to use Taylor's series expansion in the proof of convergence. Notably, this assumption is less restrictive than those imposed on the hyper-objective in <ref type="bibr">[11]</ref>. We believe this assumption can be further relaxed, but we leave this as an interesting direction for future work. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>(III.3)</head><p>Under the setting presented in this section, it can be verified that the set S(x) exists, is singleton, and is Lipschitz continuous mapping <ref type="bibr">[28]</ref>.</p><p>The goal of social planner is to minimize the overall congestion on the network while also minimizing the tolls levied on travelers. More formally, the planner's objective function is given by f (x, y) = e&#8712;E w e (y)&#8467; e (w e (y))+&#955;&#8741;x&#8741; 2 , where the first term corresponds to the average congestion on the network and second term is a regularization term with parameter &#955; &gt; 0, which ensures low values of tolls <ref type="foot">3</ref> . Thus, the problem of toll design is as follows</p><p>which is an instantiation of (SG).</p><p>Remark 1. In order to compute S(x) in (III.3) the planner needs to know the set Y that requires knowledge of the demand of travelers between various o-d pairs, which is a sensitive information. In Section V, we use the proposed approach to solve (III.4) where the designer does not know the demand of travelers and can only observe the congestion levels (w e ) e&#8712;E on the network in response to the set tolls.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. ALGORITHM AND ANALYSIS</head><p>In this section, we present a follower agnostic algorithm for solving (SG). Following which, we present the convergence guarantees of the proposed algorithm to a stationary point. Additionally, we show that the algorithm will eventually converge to a local optima by avoiding the saddle points and local maximum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Algorithmic structure</head><p>The algorithm is based on alternatively moving towards solution to the variational inequality at lower level and descending along the upper-level objective function. Specifically, between two updates of leader (upper-level), the followers (lower level) employ an iterative adaptive rule, aimed to solve the variational inequality SOL(&#8226;), for a fixed number of steps. Following which, the upper level iterates descend where F (x; &#948;, v) denotes a gradient estimator of function f (&#8226;) := f (&#8226;, S(&#8226;)), evaluated at x with parameters &#948;, v. We shall describe the estimator in detail below.</p><p>b) Gradient estimator: In order to compute the gradient of f (x), we need to compute the derivative through the solution to the variational inequality in (SG), i.e. S(x), which may involve higher order gradient computations and at times is not computable in closed form due to constraints. In this work, we consider a gradient estimator inspired from <ref type="bibr">[30]</ref>, <ref type="bibr">[31]</ref>. Specifically, we consider the following estimator</p><p>such that (i) x = x + &#948;v, where v &#8712; S(R d ) := {z &#8712; R d : &#8741;z&#8741; 2 = 1} and &#948; &gt; 0, are referred as perturbation and perturbation radius respectively; (ii) K is a positive integer capturing the number of rounds of adaptation rule employed by followers between two updates of leader's strategy; (iii) for any x &#8712; X, k &#8712; [K -1] consider a iterative solver for variational inequality denoted by H such that</p><p>where y (0) is some initialization for the iterative solver of variational inequality. For example, when the lower level problem is just a convex optimization problem with objective function g(x, &#8226;), a typical choice of H k is projected gradient descent, i.e. H k (y; x) = P Y (y -&#947; k &#8711; y g(x, y)), where P Y denotes the orthogonal projection on Y and &#947; k is the step size. Note that, in order to construct the gradient estimator in (IV.1), the leader need not know the exact description of update rule H k . For most of the paper, we shall concisely denote y (k) (x) and y (k) (x) as &#7929;(k) and y (k) respectively.</p><p>Remark 2. Direct application of zeroth-order gradient estimator from <ref type="bibr">[30]</ref>, <ref type="bibr">[31]</ref> would result in following estimator</p><p>where f is defined in (I.2). Observe that the gradient estimators F and F differ because in (IV.1) we evaluate f (x, &#8226;) at y (K) (x) while in (IV.2) we evaluated it at S(x) for any x &#8712; R d . This induces additional bias in the gradient estimator that needs to be appropriately accounted while establishing convergence results.</p><p>c) Algorithm: The algorithms runs for T rounds. In every round t &#8712; [T -1] the leader queries the followers with two strategies x t and xt = x t +&#948; t v t where v t &#8764; Unif(S(R d )) is a vector sampled uniformly randomly from the unit sphere in R d and &#948; t is the perturbation radius (refer line 2-3 in Algorithm 1). The followers respond to the leader's Algorithm 1: Follower Agnostic Stackelberg Optimization Algorithm Data: Time horizon T , Initial conditions y</p><p>t 10 end strategies by using an iterative variational inequality solver for K steps to obtain &#7929;(K) t and y (K) t respectively (refer line 4 and 7 in Algorithm 1). After observing &#7929;(K) t and y (K) t</p><p>, the leader computes a gradient estimator as per (IV.1). The leader updates its strategy for next time as per (UL) (refer line 8 in Algorithm 1). The followers initialize their strategies as per line 9 in Algorithm 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Convergence to stationary points</head><p>We now study the convergence properties of Algorithm 1.</p><p>Assumption 2. For any x, x &#8712; X, the updates in (LL) are such that &#8741;y (K) (x)-y (K) (x)&#8741; &#8804; C&#8741;x-x&#8741;, for some C &gt; 0.</p><p>Assumption 2 posits that the adaptation rule employed by followers is stable with respect to perturbations in the leader's strategy. This assumption is typically satisfied by many algorithms, including gradient-based algorithms.</p><p>Assumption 3. Atleast one of the following holds:</p><p>(1a) For any x &#8712; X, the iterates (LL) converge to equilibrium at a polynomial rate. That is, for any initial point</p><p>where &#955;, C are positive scalars. (1b) For any x &#8712; X, the iterates (LL) converge to equilibrium at a exponential rate. That is, for any initial point y (0) , &#8741;y (K) (x) -S(x)&#8741; &#8804; C&#961; K &#8741;y (0) -S(x)&#8741;, where C is a positive scalar and &#961; &#8712; [0, 1).</p><p>Remark 3. Convergence of lower-level problem is extensively studied in literature, e.g. <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>, and is not the focus of this article. Assumption 3(1a) holds for gradient descent updates for convex functions that satisfy quadratic growth condition <ref type="bibr">[34]</ref>. Meanwhile, Assumption 3(1b) holds for gradient descent on strongly convex functions.</p><p>Then the updates (x t ) t&#8712;[T ] in Algorithm 1 are such that</p><p>where &#945; = CK -&#955; if Assumption 3(1a) hold, or &#945; = &#961; K if Assumption 3(1b) hold.</p><p>Intuitively, the theorem states that if we want to converge closer to a stationary point then we need to run the Algorithm 1 with larger T or smaller &#945; (i.e. larger K). Crucially, the term &#945;d 3 in the bound is due to error accumulation between time steps due to non-convergence of lower-level to exact solution of variational inequality S(x). Owing to such precise characterization of error accumulation across time steps, our rate is informative of the computational complexity of solving the bi-level problem while in other contemporary work, namely <ref type="bibr">[15]</ref>, it resembles iteration complexity of the oracle. Since &#945; is a function of K, the number of lower level iterations in every round, we can choose K to be large enough to make sure that the algorithm converges closer to the stationary point.</p><p>Remark 4. We know that for non-convex smooth functions, gradient descent converges to a stationary point (at a rate of O(1/ &#8730; T )). However, the key point of departure of (UL) from standard gradient descent is the presence of bias in the gradient estimator. Consequently, the key component of the proof is to bound the error in the gradient estimator (IV.1). This is because the estimator can be decomposed as</p><p>denotes the bias introduced due to the difference between standard zeroth-order gradient estimator, as per (IV.2), and the true gradient. The term E</p><p>(2) t denotes the randomness introduced if we were to use the standard zeroth-order gradient estimator (IV.2). Finally, the term E</p><p>(3) t denotes the bias introduced due to difference between our gradient estimator (IV.1) and the standard zeroth-order gradient estimator (cf. Remark 2). a) Proof of Theorem 1: The proof of Theorem 1 follows by noting that f approximately decreases along the trajectory (UL) (Lemma 1). Note that the decrease is said to be "approximate" because of the bias introduced by (IV.1) in comparison to actual gradient &#8711; f (&#8226;). We then proceed to individually bound the bias terms (Lemma 2). The convergence rate follows by using the step size and perturbation radius stated in the statement of Theorem 1.</p><p>Proof of Theorem 1. From Lemma 1 we know that f (&#8226;) approximately decreases along the trajectory of (UL). That is,</p><p>Using the bounds on error terms from Lemma 2, we obtain</p><p>Re-arranging the terms and adding and substracting the term f (x * ) = min x&#8712;X f (x), we obtain</p><p>Summing the previous equation over time step t we obtain</p><p>Setting &#951; t = &#951;(t+1) -1/2 d -1 and &#948; t = &#948;(t+1) -1/4 d -1/2 , as per the statement of Theorem 1, and dividing both sides by t&#8712;[T ] &#951; t , we obtain</p><p>, where C is a positive scalar. Next, we bound Term E + Term F as follows</p><p>where in second equality</p><p>), and we have appropriately adjusted the constant C to account for additinal constants. Thus, combining (IV.4) and (IV.5), we obtain</p><p>To conclude, we obtain</p><p>This concludes the proof. Now, we formally state the Lemmas used in the proof.</p><p>The proof of Lemma 1 follows in two steps: First, we use second-order Taylor series expansion of f along the iterate values. Second, we use (UL) and complete the squares using algebraic manipulations. A detailed proof is provided in the extended version of this paper <ref type="bibr">[1]</ref>.</p><p>Lemma 2. The errors E &#8741;E (i) t &#8741; 2 for i &#8712; {1, 2, 3} are bounded as follows:</p><p>where C is a scalar and e 0 = &#8741;y</p><p>The stated bounds on E &#8741;E</p><p>t &#8741; 2 and E &#8741;E</p><p>t &#8741; 2 are inspired by the literature on two-point zeroth-order gradient estimators <ref type="bibr">[30]</ref>, <ref type="bibr">[31]</ref>. We use the Lipschitz property of f (x, &#8226;) to bound</p><p>.</p><p>Following which, Term A and Term B are recursively bounded. A detailed proof is provided in the extended version of this paper <ref type="bibr">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Non-convergence to saddle points</head><p>In this section, we show that the updates in (UL) does not converge to a saddle point. Towards that goal, we make the following assumption that posits that the function f (&#8226;) satisfy the strict saddle property. Assumption 4. For any saddle point x * of f , it holds that &#955; min (&#8711; 2 f (x * )) &lt; 0.</p><p>In the following theorem, we formally state the nonconvergence result.</p><p>Theorem 2. Let Assumption 1-4 hold. For &#1013; &gt; 0 there exists a time T &#1013; such that for any saddle point x * of f it holds that E &#8741;x t -x * &#8741; 2 &#8805; &#1013;, &#8704; t &#8805; T &#1013; .</p><p>To prove Theorem 2, an asymptotic pseudo-trajectory of (UL) is constructed. We then show that the asymptotic pseudo-trajectory almost surely avoids saddle point.</p><p>a) Proof of Theorem 2: The proof follows by contradiction. Suppose there exists a saddle point x * such that lim t-&#8594;&#8734; E &#8741;x t -x * &#8741; 2 = 0. This implies that for any &#1013; &gt; 0 there exists an integer T &#1013; such that for all t &#8805; T &#1013; it holds that E &#8741;x t+s -x * &#8741; 2 &#8804; &#1013;/4 &#8704;s &#8805; 0.</p><p>(IV.8)</p><p>Next, for any arbitrary point x t along the trajectory (UL), we define a dynamics parametrized by xt = x t + &#948; t v t , as follows z s+1 (x t ) := z s (x t ) -&#951; t+s &#8711; f (z s (x t )), where z 0 (x t ) = xt . From Lemma 3, we know that for any &#1013; &gt; 0 and positive integer S there exists T&#1013;,S such that</p><p>Next, note that &#8741;z s (x t ) -x * &#8741; 2 &#8804; 2&#8741;z s (x t ) -x t+s &#8741; 2 + 2&#8741;x t+s -x * &#8741; 2 . Therefore, combining (IV.8)-(IV.9), we observe that for every t</p><p>But from <ref type="bibr">[35]</ref> we know that for gradient descent with random initialization almost surely avoids converging to saddle point<ref type="foot">foot_3</ref> there exists S &#1013; such that for all s &#8805; S &#1013; it holds that &#8741;z s (x t ) -x * &#8741; 2 &#8805; 2&#1013;. This establishes contradiction. Lemma 3. Let x t be an arbitrary point along the trajectory (UL). Define a dynamics parametrized by xt = x t + &#948; t v t , such that z s+1 (x t ) := z s (x t ) -&#951; t+s &#8711; f (z s (x t )), where z 0 (x t ) = xt and it holds that for any positive integer L, we have</p><p>A detailed proof of Lemma 3 is provided in the online version of this article <ref type="bibr">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. NUMERICAL EXPERIMENTS</head><p>We numerically study the Algorithm 1 in the context of incentive design in routing games (described in Section III). We consider the Sioux Falls transportation network, as depicted in Figure <ref type="figure">1(a)</ref>. The latency function and network topology are inherited from <ref type="url">http://tinyurl</ref>. com/y4fm4nvt. We consider a synthetic demand of (1, 2, 3, 2, 2, 1) units, respectively, between o-d pairs Z = ((1, 20), (13, 2), (20, 1), <ref type="bibr">(10,</ref><ref type="bibr">13)</ref>, <ref type="bibr">(11,</ref><ref type="bibr">20)</ref>, <ref type="bibr">(4,</ref><ref type="bibr">21)</ref>).  The incentive designer can set tolls on each edge of the network. In response, unknown to the planner, the travelers alter their route selection as per a gradient rule. More formally, given a toll x &#8712; R |E| , we consider that the route choices made by the travelers are updated by descending along the gradient of the potential function &#934;(&#8226;, x) (cf. (III.3)). Note that, for any x &#8712; R |E| , z &#8712; Z, r &#8712; R z , the gradient is &#8706;&#934;(y,x) &#8706;yrz = e&#8712;E &#8467; e (w e (y)) &#8706;we(y) &#8706;yrz + x e &#8706;we(y) &#8706;yrz = (i) e&#8712;E &#8467; e (w e (y))1(e &#8712; r) + x e 1(e &#8712; r) = (ii) c r (y, x),</p><p>where (i) is due to (III.1) and (ii) follows from (III.2). Consequently, the gradient update takes the following form: for every z &#8712; Z, y (k+1) z = P Yz (y</p><p>rz -&#947;c r (q k , p)) r&#8712;Rz . We simulate 12 runs of Algorithm 1 with T = 1000 and K = 3. The initial route flow vector y (0) 0 and &#7929;(0) 0 are randomly initialized. We set initial tolls uniformly randomly between [0, 0.1]. We set the step size &#951; t = 6(t + 1) -1/2 , &#948; t = 0.3 &#8226; (t + 1) -1/4 , &#947; = 0.005 and &#955; = 0.01. In Figure <ref type="figure">1</ref>(b), we show the leader's objective value as function of time iterates t &#8712; [T ]. We observe that all trajectories converge to same objective value even with random initializations. This shows that the convergent point is perhaps a global optimizer.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSIONS</head><p>We propose an efficient algorithm for Stackelberg games which converges to a stationary point at a rate of O(T -1/2 ) and asymptotically reaches a local Stackelberg equilibrium. The algorithm is designed so that the leader does not need to know any information about the game structure at lower-level and updates its strategies by only querying for the followers response to its strategy. However, in this work we assume that follower's equilibrium strategy is singleton, Lipschitz and the leader's hyper-objective is differentiable. An interesting direction of future work is to relax this assumption.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>Here, we allow for tolls to take negative values. Such tolling scheme can be implemented by considering revenue-refunding schemes.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on March 06,2025 at 21:01:56 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>Note that &#955; can in-general be zero, i.e. we do not require strong convexity of leader's objective function in its decision variable for our theoretical results to hold. We choose &#955; &gt; 0 to impose a "soft-constraint" on the amount of tolls.along an "approximated" gradient estimator, inspired from zeroth-order optimization (<ref type="bibr">[30]</ref>,<ref type="bibr">[31]</ref>), evaluated at the lower-level iteration in current round.a) Leader's strategy update: The leader's update rule is as follows:x t+1 = x t -&#951; t F (x t ; &#948; t , v t ), (UL)</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>More specifically, we use the results from<ref type="bibr">[35,</ref> Proposition 8]. Even though the results in<ref type="bibr">[35,</ref> Proposition 8]  hold for gradient descent update with constant step-size, we can use this result for decaying step size in our context as well. This is because the proof of<ref type="bibr">[35,</ref> Proposition 8]  only requires each step of the gradient update to be diffeomorphism, which holds in our setting as the step-sizes are always non-negative and decaying.</p></note>
		</body>
		</text>
</TEI>
