Enhancing value function estimation through first-order state-action dynamics in offline reinforcement learning

Lien, Yun-Hsuan; Hsieh, Ping-Chun; Li, Tzu-Mao; Wang, Yu-Shuen

Citation Details

In offline reinforcement learning (RL), updating the value function with the discrete-time Bellman Equation often encounters challenges due to the limited scope of available data. This limitation stems from the Bellman Equation, which cannot accurately predict the value of unvisited states. To address this issue, we have introduced an innovative solution that bridges the continuousand discrete-time RL methods, capitalizing on their advantages. Our method uses a discrete-time RL algorithm to derive the value function from a dataset while ensuring that the function’s first derivative aligns with the local characteristics of states and actions, as defined by the HamiltonJacobi-Bellman equation in continuous RL. We provide practical algorithms for both deterministic policy gradient methods and stochastic policy gradient methods. Experiments on the D4RL dataset show that incorporating the first-order information significantly improves policy performance for offline RL problems. more »

Award ID(s):: 2238839

PAR ID:: 10572319

Author(s) / Creator(s):: Lien, Yun-Hsuan; Hsieh, Ping-Chun; Li, Tzu-Mao; Wang, Yu-Shuen

Publisher / Repository:: International Conference of Machine Learning

Date Published:: 2024-06-24

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this