Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Zhang, Siyuan; Jiang, Nan

Citation Details

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL)—which is crucial for hyperparameter tuning—is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretical advance in value-function selection, and demonstrate their effectiveness in discrete-action benchmarks such as Atari. To address performance degradation due to poor critics in continuous-action domains, we further combine BVFT with OPE to get the best of both worlds, and obtain a hyperparameter-tuning method for Q-function based OPE with theoretical guarantees as a side product. more »

Award ID(s):: 2141781

PAR ID:: 10394023

Author(s) / Creator(s):: Zhang, Siyuan; Jiang, Nan

Date Published:: 2021-12-01

Journal Name:: Advances in neural information processing systems

Volume:: 34

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Conference Paper:
The DOI is not currently available.

More Like this