null
(Ed.)
Often—for example in war games, strategy video games,
and financial simulations—the game is given to us only as
a black-box simulator in which we can play it. In these settings,
since the game may have unknown nature action distributions
(from which we can only obtain samples) and/or be
too large to expand fully, it can be difficult to compute strategies
with guarantees on exploitability. Recent work (Zhang
and Sandholm 2020) resulted in a notion of certificate for
extensive-form games that allows exploitability guarantees
while not expanding the full game tree. However, that work
assumed that the black box could sample or expand arbitrary
nodes of the game tree at any time, and that a series of exact
game solves (via, for example, linear programming) can be
conducted to compute the certificate. Each of those two assumptions
severely restricts the practical applicability of that
method. In this work, we relax both of the assumptions. We
show that high-probability certificates can be obtained with a
black box that can do nothing more than play through games,
using only a regret minimizer as a subroutine. As a bonus, we
obtain an equilibrium-finding algorithm with ~O
(1=
p
T) convergence
rate in the extensive-form game setting that does not
rely on a sampling strategy with lower-bounded reach probabilities
(which MCCFR assumes). We demonstrate experimentally
that, in the black-box setting, our methods are able
to provide nontrivial exploitability guarantees while expanding
only a small fraction of the game tree.
more »
« less