Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Huang, Wen; Wu, Xintao

doi:10.1609/aaai.v38i18.30027

Citation Details

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm’s reward distribution. A major obstacle in this setting is the existence of compound biases from the observational data. Ignoring these biases and blindly fitting a model with the biased data could even negatively affect the online learning phase. In this work, we formulate this problem from a causal perspective. First, we categorize the biases into confounding bias and selection bias based on the causal structure they imply. Next, we extract the causal bound for each arm that is robust towards compound biases from biased observational data. The derived bounds contain theground truth mean reward and can effectively guide the bandit agent to learn a nearly-optimal decision policy. We also conduct regret analysis in both contextual and non-contextual bandit settings and show that prior causal bounds could helpconsistently reduce the asymptotic regret. more »

Award ID(s):: 1910284 1940093 2137335 2147375

PAR ID:: 10527564

Author(s) / Creator(s):: Huang, Wen; Wu, Xintao

Publisher / Repository:: AAAI

Date Published:: 2024-03-25

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 38

Issue:: 18

ISSN:: 2159-5399

Page Range / eLocation ID:: 20438 to 20446

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v38i18.30027

More Like this