%AAbel, David%ADabney, Will%AHarutyunyan, Anna%AHo, Mark%ALittman, Michael%APrecup, Doina%ASingh, Satinder%D2021%I
%K
%MOSTI ID: 10331367
%PMedium: X
%TOn the Expressivity of Markov Reward
%XReward is the driving force for reinforcement-learning agents. This paper is
dedicated to understanding the expressivity of reward as a way to capture tasks
that we would want an agent to perform. We frame this study around three new
abstract notions of “task” that might be desirable: (1) a set of acceptable behaviors,
(2) a partial ordering over behaviors, or (3) a partial ordering over trajectories.
Our main results prove that while reward can express many of these tasks, there
exist instances of each task type that no Markov reward function can capture. We
then provide a set of polynomial-time algorithms that construct a Markov reward
function that allows an agent to optimize tasks of each of these three types, and
correctly determine when no such reward function exists. We conclude with an
empirical study that corroborates and illustrates our theoretical findings.
Country unknown/Code not availableOSTI-MSA