Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits

Si, Nian; Zhang, Fan; Zhou, Zhengyuan; Blanchet, Jose.

Citation Details

Policy learning using historical observational data is an important problem that has found widespread applications. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data{–}an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with bandit observational data. We propose a novel learning algorithm that is able to learn a robust policy to adversarial perturbations and unknown covariate shifts. We first present a policy evaluation procedure in an ambiguous environment and also give a heuristic algorithm to solve the distributionally robust policy learning problems efficiently. Additionally, we provide extensive simulations to demonstrate the robustness of our policy. more »

Award ID(s):: 1820942

PAR ID:: 10303790

Author(s) / Creator(s):: Si, Nian; Zhang, Fan; Zhou, Zhengyuan; Blanchet, Jose.

Editor(s):: III, Hal Daumé; Singh, Aarti

Date Published:: 2020-08-21

Journal Name:: Proceedings of Machine Learning Research

Volume:: 119

ISSN:: 2640-3498

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this