Composing Efficient, Robust Tests for Policy Selection

Morrill, D; Walsh, Thomas J; Hernandez, D; Wurman, Peter R; Stone, Peter

Citation Details

Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable k-of-N robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST fnds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fdelity racing simulator. more »

Award ID(s):: 2019844

PAR ID:: 10503035

Author(s) / Creator(s):: Morrill, D; Walsh, Thomas J; Hernandez, D; Wurman, Peter R; Stone, Peter

Publisher / Repository:: arXivorg

Date Published:: 2023-06-12

Journal Name:: arXivorg

ISSN:: 2331-8422

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this