Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Shao, Han; Cohen, Lee; Blum, Avrim; Mansour, Yishay; Saha, Aadirupa; Walter, Matthew

Citation Details

In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives. more »

Award ID(s):: 2216899 2212968

PAR ID:: 10511443

Author(s) / Creator(s):: Shao, Han; Cohen, Lee; Blum, Avrim; Mansour, Yishay; Saha, Aadirupa; Walter, Matthew

Publisher / Repository:: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Date Published:: 2023-12-12

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this