%ALee, J.%ATucker, G.%ANachum, O.%ADai, B.%ABrunskill, E.%D2022%I
%K
%MOSTI ID: 10382139
%PMedium: X
%TOracle Inequalities for Model Selection in Offline Reinforcement Learning
%XIn offline reinforcement learning (RL), a learner leverages prior logged data to
learn a good policy without interacting with the environment. A major challenge
in applying such methods in practice is the lack of both theoretically principled and
practical tools for model selection and evaluation. To address this, we study the
problem of model selection in offline RL with value function approximation. The
learner is given a nested sequence of model classes to minimize squared Bellman
error and must select among these to achieve a balance between approximation and
estimation error of the classes. We propose the first model selection algorithm for
offline RL that achieves minimax rate-optimal oracle inequalities up to logarithmic
factors. The algorithm, MODBE, takes as input a collection of candidate model
classes and a generic base offline RL algorithm. By successively eliminating
model classes using a novel one-sided generalization test, MODBE returns a policy
with regret scaling with the complexity of the minimally complete model class. In
addition to its theoretical guarantees, it is conceptually simple and computationally
efficient, amounting to solving a series of square loss regression problems and then
comparing relative square loss between classes. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.