Design of experiments for stochastic contextual linear bandits

Zanette, A.; Dong, K.; Lee, J.N.; Brunskill, E.

Citation Details

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets. more »

Award ID(s):: 2112926

PAR ID:: 10382138

Author(s) / Creator(s):: Zanette, A.; Dong, K.; Lee, J.N.; Brunskill, E.

Date Published:: 2021-01-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this