skip to main content

This content will become publicly available on May 29, 2024

Title: Approximating Discontinuous Nash Equilibrial Values of Two-Player General-Sum Differential Games
Finding Nash equilibrial policies for two-player differential games requires solving Hamilton-Jacobi-Isaacs (HJI) PDEs. Self-supervised learning has been used to approximate solutions of such PDEs while circumventing the curse of dimensionality. However, this method fails to learn discontinuous PDE solutions due to its sampling nature, leading to poor safety performance of the resulting controllers in robotics applications when player rewards are discontinuous. This paper investigates two potential solutions to this problem: a hybrid method that leverages both supervised Nash equilibria and the HJI PDE, and a value-hardening method where a sequence of HJIs are solved with a gradually hardening reward. We compare these solutions using the resulting generalization and safety performance in two vehicle interaction simulation studies with 5D and 9D state spaces, respectively. Results show that with informative supervision (e.g., collision and near-collision demonstrations) and the low cost of self-supervised learning, the hybrid method achieves better safety performance than the supervised, self-supervised, and value hardening approaches on equal computational budget. Value hardening fails to generalize in the higher-dimensional case without informative supervision. Lastly, we show that the neural activation function needs to be continuously differentiable for learning PDEs and its choice can be case dependent.  more » « less
Award ID(s):
1828010 1925403
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2023 IEEE International Conference on Robotics and Automation (ICRA)
Page Range / eLocation ID:
3022 to 3028
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers’ actions. This paper formulates a new game theoretic framework for this phenomenon, called multi-player performative prediction. We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but are generally computationally difficult to find since they are solutions of nonmonotone games. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results. 
    more » « less
  2. Abstract

    Several important PDE systems, like magnetohydrodynamics and computational electrodynamics, are known to support involutions where the divergence of a vector field evolves in divergence-free or divergence constraint-preserving fashion. Recently, new classes of PDE systems have emerged for hyperelasticity, compressible multiphase flows, so-called first-order reductions of the Einstein field equations, or a novel first-order hyperbolic reformulation of Schrödinger’s equation, to name a few, where the involution in the PDE supports curl-free or curl constraint-preserving evolution of a vector field. We study the problem of curl constraint-preserving reconstruction as it pertains to the design of mimetic finite volume (FV) WENO-like schemes for PDEs that support a curl-preserving involution. (Some insights into discontinuous Galerkin (DG) schemes are also drawn, though that is not the prime focus of this paper.) This is done for two- and three-dimensional structured mesh problems where we deliver closed form expressions for the reconstruction. The importance of multidimensional Riemann solvers in facilitating the design of such schemes is also documented. In two dimensions, a von Neumann analysis of structure-preserving WENO-like schemes that mimetically satisfy the curl constraints, is also presented. It shows the tremendous value of higher order WENO-like schemes in minimizing dissipation and dispersion for this class of problems. Numerical results are also presented to show that the edge-centered curl-preserving (ECCP) schemes meet their design accuracy. This paper is the first paper that invents non-linearly hybridized curl-preserving reconstruction and integrates it with higher order Godunov philosophy. By its very design, this paper is, therefore, intended to be forward-looking and to set the stage for future work on curl involution-constrained PDEs.

    more » « less
  3. Abstract In this article, the recently discovered phenomenon of delayed Hopf bifurcations (DHB) in reaction–diffusion partial differential equations (PDEs) is analysed in the cubic Complex Ginzburg–Landau equation, as an equation in its own right, with a slowly varying parameter. We begin by using the classical asymptotic methods of stationary phase and steepest descents on the linearized PDE to show that solutions, which have approached the attracting quasi-steady state (QSS) before the Hopf bifurcation remain near that state for long times after the instantaneous Hopf bifurcation and the QSS has become repelling. In the complex time plane, the phase function of the linearized PDE has a saddle point, and the Stokes and anti-Stokes lines are central to the asymptotics. The non-linear terms are treated by applying an iterative method to the mild form of the PDE given by perturbations about the linear particular solution. This tracks the closeness of solutions near the attracting and repelling QSS in the full, non-linear PDE. Next, we show that beyond a key Stokes line through the saddle there is a curve in the space-time plane along which the particular solution of the linear PDE ceases to be exponentially small, causing the solution of the non-linear PDE to diverge from the repelling QSS and exhibit large-amplitude oscillations. This curve is called the space–time buffer curve. The homogeneous solution also stops being exponentially small in a spatially dependent manner, as determined also by the initial data and time. Hence, a competition arises between these two solutions, as to which one ceases to be exponentially small first, and this competition governs spatial dependence of the DHB. We find four different cases of DHB, depending on the outcomes of the competition, and we quantify to leading order how these depend on the main system parameters, including the Hopf frequency, initial time, initial data, source terms, and diffusivity. Examples are presented for each case, with source terms that are a uni-modal function, a smooth step function, a spatially periodic function and an algebraically growing function. Also, rich spatio-temporal dynamics are observed in the post-DHB oscillations. Finally, it is shown that large-amplitude source terms can be designed so that solutions spend substantially longer times near the repelling QSS, and hence, region-specific control over the delayed onset of oscillations can be achieved. 
    more » « less
  4. The main objective of Personalized Tour Recommendation (PTR) is to generate a sequence of point-of-interest (POIs) for a particular tourist, according to the user-specific constraints such as duration time, start and end points, the number of attractions planned to visit, and so on. Previous PTR solutions are based on either heuristics for solving the orienteering problem to maximize a global reward with a specified budget or approaches attempting to learn user visiting preferences and transition patterns with the stochastic process or recurrent neural networks. However, existing learning methodologies rely on historical trips to train the model and use the next visited POI as the supervised signal, which may not fully capture the coherence of preferences and thus recommend similar trips to different users, primarily due to the data sparsity problem and long-tailed distribution of POI popularity. This work presents a novel tour recommendation model by distilling knowledge and supervision signals from the trips in a self-supervised manner. We propose Contrastive Trajectory Learning for Tour Recommendation (CTLTR), which utilizes the intrinsic POI dependencies and traveling intent to discover extra knowledge and augments the sparse data via pre-training auxiliary self-supervised objectives. CTLTR provides a principled way to characterize the inherent data correlations while tackling the implicit feedback and weak supervision problems by learning robust representations applicable for tour planning. We introduce a hierarchical recurrent encoder-decoder to identify tourists’ intentions and use the contrastive loss to discover subsequence semantics and their sequential patterns through maximizing the mutual information. Additionally, we observe that a data augmentation step as the preliminary of contrastive learning can solve the overfitting issue resulting from data sparsity. We conduct extensive experiments on a range of real-world datasets and demonstrate that our model can significantly improve the recommendation performance over the state-of-the-art baselines in terms of both recommendation accuracy and visiting orders. 
    more » « less
  5. Recent advances in high-resolution imaging techniques and particle-based simulation methods have enabled the precise microscopic characterization of collective dynamics in various biological and engineered active matter systems. In parallel, data-driven algorithms for learning interpretable continuum models have shown promising potential for the recovery of underlying partial differential equations (PDEs) from continuum simulation data. By contrast, learning macroscopic hydrodynamic equations for active matter directly from experiments or particle simulations remains a major challenge, especially when continuum models are not known a priori or analytic coarse graining fails, as often is the case for nondilute and heterogeneous systems. Here, we present a framework that leverages spectral basis representations and sparse regression algorithms to discover PDE models from microscopic simulation and experimental data, while incorporating the relevant physical symmetries. We illustrate the practical potential through a range of applications, from a chiral active particle model mimicking nonidentical swimming cells to recent microroller experiments and schooling fish. In all these cases, our scheme learns hydrodynamic equations that reproduce the self-organized collective dynamics observed in the simulations and experiments. This inference framework makes it possible to measure a large number of hydrodynamic parameters in parallel and directly from video data.

    more » « less