Pareto Policy Adaptation

Panagiotis Kyriakis, Jyotirmoy Deshmukh

Citation Details

We present a policy gradient method for Multi-Objective Reinforcement Learning under unknown, linear preferences. By enforcing Pareto stationarity, a first-order condition for Pareto optimality, we are able to design a simple policy gradient al- gorithm that approximates the Pareto front and infers the unknown preferences. Our method relies on a projected gradient descent solver that identifies common ascent directions for all objectives. Leveraging the solution of that solver, we in- troduce Pareto Policy Adaptation (PPA), a loss function that adapts the policy to be optimal with respect to any distribution over preferences. PPA uses implicit differentiation to back-propagate the loss gradient bypassing the operations of the projected gradient descent solver. Our approach is straightforward, easy to imple- ment and can be used with all existing policy gradient and actor-critic methods. We evaluate our method in a series of reinforcement learning tasks. more »

Award ID(s):: 1932620

PAR ID:: 10380951

Author(s) / Creator(s):: Panagiotis Kyriakis, Jyotirmoy Deshmukh

Date Published:: 2022-06-01

Journal Name:: International Conference on Learning Representations

Volume:: 2022

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this