Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation

Singh, R; Mete, A; Kar, A; Kumar, P R

Citation Details

We establish the first finite-time logarithmic regret bounds for the self-tuning regulation problem. We introduce a modified version of the certainty equivalence algorithm, which we call PIECE, that clips inputs in addition to utilizing probing inputs for exploration. We show that it has a ClogT upper bound on the regret after T time-steps for bounded noise, and Clog3T in the case of sub-Gaussian noise, unlike the LQ problem where logarithmic regret is shown to be not possible. The PIECE algorithm is also designed to address the critical challenge of poor initial transient performance of reinforcement learning algorithms for linear systems. Comparative simulation results illustrate the improved performance of PIECE. more »

Award ID(s):: 2038625

PAR ID:: 10563426

Author(s) / Creator(s):: Singh, R; Mete, A; Kar, A; Kumar, P R

Publisher / Repository:: Proceedings of the 41st International Conference on Machine Learning

Date Published:: 2024-07-29

Page Range / eLocation ID:: 235:45571-45636

Format(s):: Medium: X

Location:: Vienna, Austria

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this