 Award ID(s):
 1908918
 NSFPAR ID:
 10500956
 Publisher / Repository:
 IEEE
 Date Published:
 Journal Name:
 Proceedings of the American Control Conference
 ISSN:
 23785861
 Page Range / eLocation ID:
 2733 to 2738
 Subject(s) / Keyword(s):
 ["Nonlinear, control theory, numerical methods, game theory"]
 Format(s):
 Medium: X
 Location:
 San Diego, CA, USA
 Sponsoring Org:
 National Science Foundation
More Like this

By exploiting minplus linearity, semiconcavity, and semigroup properties of dynamic programming, a fundamental solution semigroup for a class of approximate finite horizon linear infinite dimensional optimal control problems is constructed. Elements of this fundamental solution semigroup are parameterized by the time horizon, and can be used to approximate the solution of the corresponding finite horizon optimal control problem for any terminal cost. They can also be composed to compute approximations on longer horizons. The value function approximation provided takes the form of a minplus convolution of a kernel with the terminal cost. A general construction for this kernel is provided, along with a spectral representation for a restricted class of subproblems.more » « less

Abstract We introduce and study a class of optimization problems we call replenishment problems with fixed turnover times: a very natural model that has received little attention in the literature. Clients with capacity for storing a certain commodity are located at various places; at each client the commodity depletes within a certain time, the turnover time, which is constant but can vary between locations. Clients should never run empty. The natural feature that makes this problem interesting is that we may schedule a replenishment (well) before a client becomes empty, but then the next replenishment will be due earlier also. This added workload needs to be balanced against the cost of routing vehicles to do the replenishments. In this paper, we focus on the aspect of minimizing routing costs. However, the framework of recurring tasks, in which the next job of a task must be done within a fixed amount of time after the previous one is much more general and gives an adequate model for many practical situations. Note that our problem has an infinite time horizon. However, it can be fully characterized by a compact input, containing only the location of each client and a turnover time. This makes determining its computational complexity highly challenging and indeed it remains essentially unresolved. We study the problem for two objectives:
min –avg minimizes the average tour cost andmin –max minimizes the maximum tour cost over all days. Formin –max we derive a logarithmic factor approximation for the problem on general metrics and a 6approximation for the problem on trees, for which we have a proof of NPhardness. Formin –avg we present a logarithmic factor approximation on general metrics, a 2approximation for trees, and a pseudopolynomial time algorithm for the line. Many intriguing problems remain open. 
We study the menu complexity of optimal and approximatelyoptimal auctions in the context of the ``FedEx'' problem, a socalled ``oneandahalfdimensional'' setting where a single bidder has both a value and a deadline for receiving an item [FGKK 16]. The menu complexity of an auction is equal to the number of distinct (allocation, price) pairs that a bidder might receive [HN 13]. We show the following when the bidder has $n$ possible deadlines: 1) Exponential menu complexity is necessary to be exactly optimal: There exist instances where the optimal mechanism has menu complexity $\geq 2^n1$. This matches exactly the upper bound provided by Fiat et al.'s algorithm, and resolves one of their open questions [FGKK 16]. 2) Fully polynomial menu complexity is necessary and sufficient for approximation: For all instances, there exists a mechanism guaranteeing a multiplicative $(1\epsilon)$approximation to the optimal revenue with menu complexity $O(n^{3/2}\sqrt{\frac{\min\{n/\epsilon,\ln(v_{\max})\}}{\epsilon}}) = O(n^2/\epsilon)$, where $v_{\max}$ denotes the largest value in the support of integral distributions. \item There exist instances where any mechanism guaranteeing a multiplicative $(1O(1/n^2))$approximation to the optimal revenue requires menu complexity $\Omega(n^2)$. Our main technique is the polygon approximation of concave functions [Rote 91], and our results here should be of independent interest. We further show how our techniques can be used to resolve an open question of [DW 17] on the menu complexity of optimal auctions for a budgetconstrained buyer.more » « less

Nonzero sum games typically have multiple Nash equilibriums (or no equilibrium), and unlike the zerosum case, they may have different values at different equilibriums. Instead of focusing on the existence of individual equilibriums, we study the set of values over all equilibriums, which we call the set value of the game. The set value is unique by nature and always exists (with possible value [Formula: see text]). Similar to the standard value function in control literature, it enjoys many nice properties, such as regularity, stability, and more importantly, the dynamic programming principle. There are two main features in order to obtain the dynamic programming principle: (i) we must use closedloop controls (instead of openloop controls); and (ii) we must allow for path dependent controls, even if the problem is in a statedependent (Markovian) setting. We shall consider both discrete and continuous time models with finite time horizon. For the latter, we will also provide a duality approach through certain standard PDE (or pathdependent PDE), which is quite efficient for numerically computing the set value of the game.more » « less

null (Ed.)This paper investigates the use of modelfree reinforcement learning to compute the optimal value in twoplayer stochastic games with parity objectives. In this setting, two decision makers, player Min and player Max, compete on a finite game arena  a stochastic game graph with unknown but fixed probability distributions  to minimize and maximize, respectively, the probability of satisfying a parity objective. We give a reduction from stochastic parity games to a family of stochastic reachability games with a parameter ε, such that the value of a stochastic parity game equals the limit of the values of the corresponding simple stochastic games as the parameter ε tends to 0. Since this reduction does not require the knowledge of the probabilistic transition structure of the underlying game arena, modelfree reinforcement learning algorithms, such as minimax Qlearning, can be used to approximate the value and mutual bestresponse strategies for both players in the underlying stochastic parity game. We also present a streamlined reduction from 1 1/2player parity games to reachability games that avoids recourse to nondeterminism. Finally, we report on the experimental evaluations of both reductionsmore » « less