Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

Mechergui, Malek; Sreedharan, Sarath

doi:10.1609/aaai.v38i9.28875

Citation Details

Goal Alignment: Re-analyzing Value Alignment Problems Using Human-Aware AI

While the question of misspecified objectives has gotten much attention in recent years, most works in this area primarily focus on the challenges related to the complexity of the objective specification mechanism (for example, the use of reward functions). However, the complexity of the objective specification mechanism is just one of many reasons why the user may have misspecified their objective. A foundational cause for misspecification that is being overlooked by these works is the inherent asymmetry in human expectations about the agent's behavior and the behavior generated by the agent for the specified objective. To address this, we propose a novel formulation for the objective misspecification problem that builds on the human-aware planning literature, which was originally introduced to support explanation and explicable behavioral generation. Additionally, we propose a first-of-its-kind interactive algorithm that is capable of using information generated under incorrect beliefs about the agent to determine the true underlying goal of the user. more »

Award ID(s):: 2303019

PAR ID:: 10537720

Author(s) / Creator(s):: Mechergui, Malek; Sreedharan, Sarath

Publisher / Repository:: AAAI Press

Date Published:: 2024-03-25

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 38

Issue:: 9

ISSN:: 2159-5399

Page Range / eLocation ID:: 10110 to 10118

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v38i9.28875

More Like this