Minimax Model Learning

Voloshin, C; Jiang, N; Yue, YS

Citation Details

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model mis-specification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO. more »

Award ID(s):: 1645832

PAR ID:: 10329359

Author(s) / Creator(s):: Voloshin, C; Jiang, N; Yue, YS

Editor(s):: Banerjee, A; Fukumizu, K

Date Published:: 2021-01-01

Journal Name:: 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS)

Volume:: 130

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this