Convex Q-Learning in Continuous Time with Application to Dispatch of Distributed Energy Resources

Lu, Fan; Mathias, Joel; Meyn, Sean; Kalsi, Karanjit

doi:10.1109/CDC49753.2023.10383620

Citation Details

Convex Q-Learning in Continuous Time with Application to Dispatch of Distributed Energy Resources

Convex Q-learning is a recent approach to reinforcement learning, motivated by the possibility of a firmer theory for convergence, and the possibility of making use of greater a priori knowledge regarding policy or value function structure. This paper explores algorithm design in the continuous time domain, with a finite-horizon optimal control objective. The main contributions are (i) The new Q-ODE: a model-free characterization of the Hamilton-Jacobi-Bellman equation. (ii) A formulation of Convex Q-learning that avoids approximations appearing in prior work. The Bellman error used in the algorithm is defined by filtered measurements, which is necessary in the presence of measurement noise. (iii) Convex Q-learning with linear function approximation is a convex program. It is shown that the constraint region is bounded, subject to an exploration condition on the training input. (iv) The theory is illustrated in application to resource allocation for distributed energy resources, for which the theory is ideally suited. more »

Award ID(s):: 2122313

PAR ID:: 10529255

Author(s) / Creator(s):: Lu, Fan; Mathias, Joel; Meyn, Sean; Kalsi, Karanjit

Publisher / Repository:: IEEE

Date Published:: 2023-12-13

ISBN:: 979-8-3503-0124-3

Page Range / eLocation ID:: 1529 to 1536

Subject(s) / Keyword(s):: Demand dispatch model predictive control reinforcement learning.

Format(s):: Medium: X

Location:: Singapore, Singapore

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/CDC49753.2023.10383620

More Like this