skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: Continuity of the value function for deterministic optimal impulse control with terminal state constraint
Deterministic optimal impulse control problem with terminal state constraint is considered. Due to the appearance of the terminal state constraint, the value function might be discontinuous in general. The main contribution of this paper is the introduction of an intrinsic condition under which the value function is proved to be continuous. Then by a Bellman dynamic programming principle, the corresponding Hamilton-Jacobi-Bellman type quasi-variational inequality (QVI, for short) is derived. The value function is proved to be a viscosity solution to such a QVI. The issue of whether the value function is characterized as the unique viscosity solution to this QVI is carefully addressed and the answer is left open challengingly.  more » « less
Award ID(s):
1812921
PAR ID:
10341968
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ESAIM: Control, Optimisation and Calculus of Variations
Volume:
27
ISSN:
1292-8119
Page Range / eLocation ID:
104
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper studies an optimal stochastic impulse control problem in a finite time horizon with a decision lag, by which we mean that after an impulse is made, a fixed number units of time has to be elapsed before the next impulse is allowed to be made. The continuity of the value function is proved. A suitable version of dynamic programming principle is established, which takes into account the dependence of state process on the elapsed time. The corresponding Hamilton-Jacobi-Bellman (HJB) equation is derived, which exhibits some special feature of the problem. The value function of this optimal impulse control problem is characterized as the unique viscosity solution to the corresponding HJB equation. An optimal impulse control is constructed provided the value function is given. Moreover, a limiting case with the waiting time approaching 0 is discussed. 
    more » « less
  2. An optimal control problem in the space of probability measures, and the viscosity solu- tions of the corresponding dynamic programming equations defined using the intrinsic linear derivative are studied. The value function is shown to be Lipschitz continuous with respect to a novel smooth Fourier-Wasserstein metric. A comparison result between the Lipschitz viscosity sub and super solutions of the dynamic programming equation is proved using this metric, characterizing the value function as the unique Lipschitz viscosity solution. 
    more » « less
  3. In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuousand discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the HamiltonJacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems. 
    more » « less
  4. The inability to naturally enforce safety in Reinforcement Learning (RL), with limited failures, is a core challenge impeding its use in real-world applications. One notion of safety of vast practical relevance is the ability to avoid (unsafe) regions of the state space. Though such a safety goal can be captured by an action-value-like function, a.k.a. safety critics, the associated operator lacks the desired contraction and uniqueness properties that the classical Bellman operator enjoys. In this work, we overcome the non-contractiveness of safety critic operators by leveraging that safety is a binary property. To that end, we study the properties of the binary safety critic associated with a deterministic dynamical system that seeks to avoid reaching an unsafe region. We formulate the corresponding binary Bellman equation (B2E) for safety and study its properties. While the resulting operator is still non-contractive, we fully characterize its fixed points representing--except for a spurious solution--maximal persistently safe regions of the state space that can always avoid failure. We provide an algorithm that, by design, leverages axiomatic knowledge of safe data to avoid spurious fixed points. 
    more » « less
  5. Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence, and the possibility of making use of greater a priori knowledge regarding policy or value function structure. This paper explores algorithm design in the continuous time domain, with a finite-horizon optimal control objective. The main contributions are (i) The new Q-ODE: a model-free characterization of the Hamilton-Jacobi-Bellman equation. (ii) A formulation of Convex Q-learning that avoids approximations appearing in prior work. The Bellman error used in the algorithm is defined by filtered measurements, which is necessary in the presence of measurement noise. (iii) Convex Q-learning with linear function approximation is a convex program. It is shown that the constraint region is bounded, subject to an exploration condition on the training input. (iv) The theory is illustrated in application to resource allocation for distributed energy resources, for which the theory is ideally suited. 
    more » « less