Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Da, Longchao; Jenkins, Porter; Schwantes, Trevor; Dotson, Jeffrey; Wei, Hua

doi:10.1609/aaai.v38i18.30019

Citation Details

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special case performance (e.g., worst or best cases), due to the lack of holistic characterization of policies’ performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst-, best-, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment. more »

Award ID(s):: 2421839

PAR ID:: 10525350

Author(s) / Creator(s):: Da, Longchao; Jenkins, Porter; Schwantes, Trevor; Dotson, Jeffrey; Wei, Hua

Publisher / Repository:: Proceedings of the AAAI Conference on Artificial Intelligence

Date Published:: 2024-03-25

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 38

Issue:: 18

ISSN:: 2159-5399

Page Range / eLocation ID:: 20370 to 20378

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v38i18.30019

More Like this