skip to main content

Title: Min-max and stat game representations for nonlinear optimal control problems
A finite horizon nonlinear optimal control problem is considered for which the associated Hamiltonian satisfies a uniform semiconcavity property with respect to its state and costate variables. It is shown that the value function for this optimal control problem is equivalent to the value of a min-max game, provided the time horizon considered is sufficiently short. This further reduces to maximization of a linear functional over a convex set. It is further proposed that the min-max game can be relaxed to a type of stat (stationary) game, in which no time horizon constraint is involved.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ;
Publisher / Repository:
Date Published:
Journal Name:
Proceedings of the American Control Conference
Page Range / eLocation ID:
2733 to 2738
Subject(s) / Keyword(s):
["Nonlinear, control theory, numerical methods, game theory"]
Medium: X
San Diego, CA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. By exploiting min-plus linearity, semiconcavity, and semigroup properties of dynamic programming, a fundamental solution semigroup for a class of approximate finite horizon linear infinite dimensional optimal control problems is constructed. Elements of this fundamental solution semigroup are parameterized by the time horizon, and can be used to approximate the solution of the corresponding finite horizon optimal control problem for any terminal cost. They can also be composed to compute approximations on longer horizons. The value function approximation provided takes the form of a min-plus convolution of a kernel with the terminal cost. A general construction for this kernel is provided, along with a spectral representation for a restricted class of sub-problems. 
    more » « less
  2. Abstract

    We introduce and study a class of optimization problems we call replenishment problems with fixed turnover times: a very natural model that has received little attention in the literature. Clients with capacity for storing a certain commodity are located at various places; at each client the commodity depletes within a certain time, the turnover time, which is constant but can vary between locations. Clients should never run empty. The natural feature that makes this problem interesting is that we may schedule a replenishment (well) before a client becomes empty, but then the next replenishment will be due earlier also. This added workload needs to be balanced against the cost of routing vehicles to do the replenishments. In this paper, we focus on the aspect of minimizing routing costs. However, the framework of recurring tasks, in which the next job of a task must be done within a fixed amount of time after the previous one is much more general and gives an adequate model for many practical situations. Note that our problem has an infinite time horizon. However, it can be fully characterized by a compact input, containing only the location of each client and a turnover time. This makes determining its computational complexity highly challenging and indeed it remains essentially unresolved. We study the problem for two objectives:minavg  minimizes the average tour cost andminmax  minimizes the maximum tour cost over all days. Forminmax  we derive a logarithmic factor approximation for the problem on general metrics and a 6-approximation for the problem on trees, for which we have a proof of NP-hardness. Forminavg  we present a logarithmic factor approximation on general metrics, a 2-approximation for trees, and a pseudopolynomial time algorithm for the line. Many intriguing problems remain open.

    more » « less
  3. We study the menu complexity of optimal and approximately-optimal auctions in the context of the ``FedEx'' problem, a so-called ``one-and-a-half-dimensional'' setting where a single bidder has both a value and a deadline for receiving an item [FGKK 16]. The menu complexity of an auction is equal to the number of distinct (allocation, price) pairs that a bidder might receive [HN 13]. We show the following when the bidder has $n$ possible deadlines: 1) Exponential menu complexity is necessary to be exactly optimal: There exist instances where the optimal mechanism has menu complexity $\geq 2^n-1$. This matches exactly the upper bound provided by Fiat et al.'s algorithm, and resolves one of their open questions [FGKK 16]. 2) Fully polynomial menu complexity is necessary and sufficient for approximation: For all instances, there exists a mechanism guaranteeing a multiplicative $(1-\epsilon)$-approximation to the optimal revenue with menu complexity $O(n^{3/2}\sqrt{\frac{\min\{n/\epsilon,\ln(v_{\max})\}}{\epsilon}}) = O(n^2/\epsilon)$, where $v_{\max}$ denotes the largest value in the support of integral distributions. \item There exist instances where any mechanism guaranteeing a multiplicative $(1-O(1/n^2))$-approximation to the optimal revenue requires menu complexity $\Omega(n^2)$. Our main technique is the polygon approximation of concave functions [Rote 91], and our results here should be of independent interest. We further show how our techniques can be used to resolve an open question of [DW 17] on the menu complexity of optimal auctions for a budget-constrained buyer. 
    more » « less
  4. Nonzero sum games typically have multiple Nash equilibriums (or no equilibrium), and unlike the zero-sum case, they may have different values at different equilibriums. Instead of focusing on the existence of individual equilibriums, we study the set of values over all equilibriums, which we call the set value of the game. The set value is unique by nature and always exists (with possible value [Formula: see text]). Similar to the standard value function in control literature, it enjoys many nice properties, such as regularity, stability, and more importantly, the dynamic programming principle. There are two main features in order to obtain the dynamic programming principle: (i) we must use closed-loop controls (instead of open-loop controls); and (ii) we must allow for path dependent controls, even if the problem is in a state-dependent (Markovian) setting. We shall consider both discrete and continuous time models with finite time horizon. For the latter, we will also provide a duality approach through certain standard PDE (or path-dependent PDE), which is quite efficient for numerically computing the set value of the game. 
    more » « less
  5. null (Ed.)
    This paper investigates the use of model-free reinforcement learning to compute the optimal value in two-player stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena - a stochastic game graph with unknown but fixed probability distributions - to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, model-free reinforcement learning algorithms, such as minimax Q-learning, can be used to approximate the value and mutual best-response strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2-player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductions 
    more » « less