Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances

Saito, Yuta; Joachims, Thorsten

doi:10.1145/3460231.3473320

Citation Details

Counterfactual Learning and Evaluation for Recommender Systems: Foundations, Implementations, and Recent Advances

Counterfactual estimators enable the use of existing log data to estimate how some new target recommendation policy would have performed, if it had been used instead of the policy that logged the data. We say that those estimators work ”off-policy”, since the policy that logged the data is different from the target policy. In this way, counterfactual estimators enable Off-policy Evaluation (OPE) akin to an unbiased offline A/B test, as well as learning new recommendation policies through Off-policy Learning (OPL). The goal of this tutorial is to summarize Foundations, Implementations, and Recent Advances of OPE/OPL. Specifically, we will introduce the fundamentals of OPE/OPL and provide theoretical and empirical comparisons of conventional methods. Then, we will cover emerging practical challenges such as how to take into account combinatorial actions, distributional shift, fairness of exposure, and two-sided market structures. We will then present Open Bandit Pipeline, an open-source package for OPE/OPL, and how it can be used for both research and practical purposes. We will conclude the tutorial by presenting real-world case studies and future directions. more »

Award ID(s):: 1901168

PAR ID:: 10309941

Author(s) / Creator(s):: Saito, Yuta; Joachims, Thorsten

Date Published:: 2021-09-13

Journal Name:: ACM Conference on Recommender Systems

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3460231.3473320

More Like this