skip to main content


Title: Dynamic Set Values for Nonzero-Sum Games with Multiple Equilibriums
Nonzero sum games typically have multiple Nash equilibriums (or no equilibrium), and unlike the zero-sum case, they may have different values at different equilibriums. Instead of focusing on the existence of individual equilibriums, we study the set of values over all equilibriums, which we call the set value of the game. The set value is unique by nature and always exists (with possible value [Formula: see text]). Similar to the standard value function in control literature, it enjoys many nice properties, such as regularity, stability, and more importantly, the dynamic programming principle. There are two main features in order to obtain the dynamic programming principle: (i) we must use closed-loop controls (instead of open-loop controls); and (ii) we must allow for path dependent controls, even if the problem is in a state-dependent (Markovian) setting. We shall consider both discrete and continuous time models with finite time horizon. For the latter, we will also provide a duality approach through certain standard PDE (or path-dependent PDE), which is quite efficient for numerically computing the set value of the game.  more » « less
Award ID(s):
1908665
NSF-PAR ID:
10329540
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Mathematics of Operations Research
Volume:
47
Issue:
1
ISSN:
0364-765X
Page Range / eLocation ID:
616 to 642
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Employing mobile actuators and sensors for control and estimation of spatially distributed processes offers a significant advantage over immobile actuators and sensors. In addition to the control performance improvement, one also comes across the economic advantages since fewer devices, if allowed to be repositioned within a spatial domain, must be employed. While simulation studies of mobile actuators report superb controller performance, they are far from reality as the mechanical constraints of the mobile platforms carrying actuators and sensors have to satisfy motional constraints. Terrain platforms cannot behave as point masses without inertia; instead they must satisfy constraints which are adequately represented as path-dependent reachability sets. When the control algorithm commands a mobile platform to reposition itself in a different spatial location within the spatial domain, this does not occur instantaneously and for the most part the motion is not omnidirectional. This constraint is combined with a computationally feasible and suboptimal control policy with mobile actuators to arrive at a numerically viable control and guidance scheme. The feasible control decision comes from a continuous-discrete control policy whereby the mobile platform carrying the actuator is repositioned at discrete times and dwells in a specific position for a certain time interval. Moving to a subsequent spatial location and computing its associated path over a physics-imposed time interval, a set of candidate positions and paths is derived using a path-dependent reachability set. Embedded into the path-dependent reachability sets that dictate the mobile actuator repositioning, a scheme is proposed to integrate collocated sensing measurements in order to minimize costly state estimation schemes. The proposed scheme is demonstrated with a 2D PDE having two sets of collocated actuator-sensor pairs onboard mobile platforms. 
    more » « less
  2. null (Ed.)
    Abstract We consider a natural generalization of classical scheduling problems to a setting in which using a time unit for processing a job causes some time-dependent cost, the time-of-use tariff, which must be paid in addition to the standard scheduling cost. We focus on preemptive single-machine scheduling and two classical scheduling cost functions, the sum of (weighted) completion times and the maximum completion time, that is, the makespan. While these problems are easy to solve in the classical scheduling setting, they are considerably more complex when time-of-use tariffs must be considered. We contribute optimal polynomial-time algorithms and best possible approximation algorithms. For the problem of minimizing the total (weighted) completion time on a single machine, we present a polynomial-time algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time slots to be used for preemptively scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomial-time approximation scheme (PTAS) based on a dual scheduling approach introduced for scheduling on a machine of varying speed. As the weighted problem is strongly NP-hard, our PTAS is the best possible approximation we can hope for. For preemptive scheduling to minimize the makespan, we show that there is a comparably simple optimal algorithm with polynomial running time. This is true even in a certain generalized model with unrelated machines. 
    more » « less
  3. Scholkopf, Bernhard ; Uhler, Caroline ; Zhang, Kun (Ed.)
    In order to test if a treatment is perceptibly different from a placebo in a randomized experiment with covariates, classical nonparametric tests based on ranks of observations/residuals have been employed (eg: by Rosenbaum), with finite-sample valid inference enabled via permutations. This paper proposes a different principle on which to base inference: if — with access to all covariates and outcomes, but without access to any treatment assignments — one can form a ranking of the subjects that is sufficiently nonrandom (eg: mostly treated followed by mostly control), then we can confidently conclude that there must be a treatment effect. Based on a more nuanced, quantifiable, version of this principle, we design an interactive test called i-bet: the analyst forms a single permutation of the subjects one element at a time, and at each step the analyst bets toy money on whether that subject was actually treated or not, and learns the truth immediately after. The wealth process forms a real-valued measure of evidence against the global causal null, and we may reject the null at level if the wealth ever crosses 1= . Apart from providing a fresh “game-theoretic” principle on which to base the causal conclusion, the i-bet has other statistical and computational benefits, for example (A) allowing a human to adaptively design the test statistic based on increasing amounts of data being revealed (along with any working causal models and prior knowledge), and (B) not requiring permutation resampling, instead noting that under the null, the wealth forms a nonnegative martingale, and the type-1 error control of the aforementioned decision rule follows from a tight inequality by Ville. Further, if the null is not rejected, new subjects can later be added and the test can be simply continued, without any corrections (unlike with permutation p-values). Numerical experiments demonstrate good power under various heterogeneous treatment effects. We first describe i-bet test for two-sample comparisons with unpaired data, and then adapt it to paired data, multi-sample comparison, and sequential settings; these may be viewed as interactive martingale variants of the Wilcoxon, Kruskal-Wallis, and Friedman tests. 
    more » « less
  4. Recent clinical trials have shown that adaptive drug therapies can be more efficient than a standard cancer treatment based on a continuous use of maximum tolerated doses (MTD). The adaptive therapy paradigm is not based on a preset schedule; instead, the doses are administered based on the current state of tumour. But the adaptive treatment policies examined so far have been largely ad hoc. We propose a method for systematically optimizing adaptive policies based on an evolutionary game theory model of cancer dynamics. Given a set of treatment objectives, we use the framework of dynamic programming to find the optimal treatment strategies. In particular, we optimize the total drug usage and time to recovery by solving a Hamilton–Jacobi–Bellman equation. We compare MTD-based treatment strategy with optimal adaptive treatment policies and show that the latter can significantly decrease the total amount of drugs prescribed while also increasing the fraction of initial tumour states from which the recovery is possible. We conclude that the use of optimal control theory to improve adaptive policies is a promising concept in cancer treatment and should be integrated into clinical trial design. 
    more » « less
  5. Garbage collection (GC) support for unmanaged languages can reduce programming burden in reasoning about liveness of dynamic objects. It also avoids temporal memory safety violations and memory leaks.SoundGC for weakly-typed languages such as C/C++, however, remains an unsolved problem. Current value-based GC solutions examine values of memory locations to discover the pointers, and the objects they point to. The approach is inherently unsound in the presence of arbitrary type casts and pointer manipulations, which are legal in C/C++. Such language features are regularly used, especially in low-level systems code.

    In this paper, we propose Dynamic Pointer Provenance Tracking to realize sound GC. We observe that pointers cannot be created out-of-thin-air, and they must have provenance to at least one valid allocation. Therefore, by tracking pointer provenance from the source (e.g., malloc) through both explicit data-flow and implicit control-flow, our GC has sound and precise information to compute the set of all reachable objects at any program state. We discuss several static analysis optimizations, that can be employed during compilation aided with profiling, to significantly reduce the overhead of dynamic provenance tracking from nearly 8× to 16% for well-behaved programs that adhere to the C standards. Pointer provenance based sound GC invocation is also 13% faster and reclaims 6% more memory on average, compared to an unsound value-based GC.

     
    more » « less