skip to main content

Title: A finite horizon optimal stochastic impulse control problem with a decision lag
This paper studies an optimal stochastic impulse control problem in a finite time horizon with a decision lag, by which we mean that after an impulse is made, a fixed number units of time has to be elapsed before the next impulse is allowed to be made. The continuity of the value function is proved. A suitable version of dynamic programming principle is established, which takes into account the dependence of state process on the elapsed time. The corresponding Hamilton-Jacobi-Bellman (HJB) equation is derived, which exhibits some special feature of the problem. The value function of this optimal impulse control problem is characterized as the unique viscosity solution to the corresponding HJB equation. An optimal impulse control is constructed provided the value function is given. Moreover, a limiting case with the waiting time approaching 0 is discussed.
Authors:
;
Award ID(s):
1812921
Publication Date:
NSF-PAR ID:
10220019
Journal Name:
Dynamics of continuous discrete and impulsive systems
Volume:
28
Page Range or eLocation-ID:
89-123
ISSN:
1918-2538
Sponsoring Org:
National Science Foundation
More Like this
  1. Buttazzo, G. ; Casas, E. ; de Teresa, L. ; Glowinski, R. ; Leugering, G. ; Trélat, E. ; Zhang, X. (Ed.)
    An optimal control problem is considered for a stochastic differential equation with the cost functional determined by a backward stochastic Volterra integral equation (BSVIE, for short). This kind of cost functional can cover the general discounting (including exponential and non-exponential) situations with a recursive feature. It is known that such a problem is time-inconsistent in general. Therefore, instead of finding a global optimal control, we look for a time-consistent locally near optimal equilibrium strategy. With the idea of multi-person differential games, a family of approximate equilibrium strategies is constructed associated with partitions of the time intervals. By sending the meshmore »size of the time interval partition to zero, an equilibrium Hamilton–Jacobi–Bellman (HJB, for short) equation is derived, through which the equilibrium value function and an equilibrium strategy are obtained. Under certain conditions, a verification theorem is proved and the well-posedness of the equilibrium HJB is established. As a sort of Feynman–Kac formula for the equilibrium HJB equation, a new class of BSVIEs (containing the diagonal value Z ( r , r ) of Z (⋅ , ⋅)) is naturally introduced and the well-posedness of such kind of equations is briefly presented.« less
  2. We propose a neural network approach that yields approximate solutions for high-dimensional optimal control problems and demonstrate its effectiveness using examples from multi-agent path finding. Our approach yields controls in a feedback form, where the policy function is given by a neural network (NN). Specifically, we fuse the Hamilton-Jacobi-Bellman (HJB) and Pontryagin Maximum Principle (PMP) approaches by parameterizing the value function with an NN. Our approach enables us to obtain approximately optimal controls in real-time without having to solve an optimization problem. Once the policy function is trained, generating a control at a given space-time location takes milliseconds; in contrast,more »efficient nonlinear programming methods typically perform the same task in seconds. We train the NN offline using the objective function of the control problem and penalty terms that enforce the HJB equations. Therefore, our training algorithm does not involve data generated by another algorithm. By training on a distribution of initial states, we ensure the controls' optimality on a large portion of the state-space. Our grid-free approach scales efficiently to dimensions where grids become impractical or infeasible. We apply our approach to several multi-agent collision-avoidance problems in up to 150 dimensions. Furthermore, we empirically observe that the number of parameters in our approach scales linearly with the dimension of the control problem, thereby mitigating the curse of dimensionality.« less
  3. The vortex dynamics and lift force generated by a sinusoidally heaving and pitching airfoil during dynamic stall are experimentally investigated for reduced frequencies of k = fc=U1 = 0:06􀀀0:16, pitching amplitude of 0 = 75 and heaving amplitude of h0=c = 0:6. The lift force is calculated from the velocity fi elds using the nite-domain impulse theory. The concept of moment arm dilemma associated with the impulse equation is revisited to shed-light on its physical impact on the calculated forces. It is shown that by selecting an objectively de ned origin of the moment-arm, the impulse force equation can bemore »greatly simpli ed to two terms that have a clear physical meaning: (i) the time rate of change of impulse of vortical structures within the control volume and (ii) Lamb vector that indirectly captures the contribution of vortical structures outside of the control volume. The results show that the trend of the lift force is dependent on the formation of the leading edge vortex, as well as its time rate of change of circulation and chord-wise advection relative to the airfoil. Additionally, the trailing edge vortex, which is observed to only form for k  0:10, is shown to have lift-diminishing e ects that intensi es with increasing reduced frequency. Lastly, the concept of optimal vortex formation is investigated. The leading edge vortex is shown to attain the optimal formation number of approximately 4 for k  0:1, when the scaling is based on the leading edge shear velocity. For larger values of k the vortex growth is delayed to later in the cycle and doesn't reach its optimal value. The result is that the peak lift force occurs later in the cycle. This has consequences on power production which relies on correlation of the relative timing of lift force and heaving velocity.« less
  4. We propose a neural network approach for solving high-dimensional optimal control problems. In particular, we focus on multi-agent control problems with obstacle and collision avoidance. These problems immediately become high-dimensional, even for moderate phase-space dimensions per agent. Our approach fuses the Pontryagin Maximum Principle and Hamilton-Jacobi-Bellman (HJB) approaches and parameterizes the value function with a neural network. Our approach yields controls in a feedback form for quick calculation and robustness to moderate disturbances to the system. We train our model using the objective function and optimality conditions of the control problem. Therefore, our training algorithm neither involves a data generationmore »phase nor solutions from another algorithm. Our model uses empirically effective HJB penalizers for efficient training. By training on a distribution of initial states, we ensure the controls' optimality is achieved on a large portion of the state-space. Our approach is grid-free and scales efficiently to dimensions where grids become impractical or infeasible. We demonstrate our approach's effectiveness on a 150-dimensional multi-agent problem with obstacles.« less
  5. By exploiting min-plus linearity, semiconcavity, and semigroup properties of dynamic programming, a fundamental solution semigroup for a class of approximate finite horizon linear infinite dimensional optimal control problems is constructed. Elements of this fundamental solution semigroup are parameterized by the time horizon, and can be used to approximate the solution of the corresponding finite horizon optimal control problem for any terminal cost. They can also be composed to compute approximations on longer horizons. The value function approximation provided takes the form of a min-plus convolution of a kernel with the terminal cost. A general construction for this kernel is provided,more »along with a spectral representation for a restricted class of sub-problems.« less