Koyejo, S.
; Mohamed, S.
; Agarwal, A.
; Belgrave, D.
; Cho, K.
; Oh, A.
(Ed.)
While much progress has been made in understanding the minimax sample complexity of
reinforcement learning (RL)—the complexity of learning on the “worst-case” instance—such
measures of complexity often do not capture the true difficulty of learning. In practice, on an
“easy” instance, we might hope to achieve a complexity far better than that achievable on the
worst-case instance. In this work we seek to understand the “instance-dependent” complexity of
learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation.
We propose an algorithm, Pedel, which achieves a fine-grained instance-dependent measure of
complexity, the first of its kind in the RL with function approximation setting, thereby capturing
the difficulty of learning on each particular problem instance. Through an explicit example, we
show that Pedel yields provable gains over low-regret, minimax-optimal algorithms and that
such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online
experiment design-based procedure which focuses the exploration budget on the “directions”
most relevant to learning a near-optimal policy, and may be of independent interest.
more »
« less