Search for: All records

Creators/Authors contains: "Schneider, Jordan"

« Prev Next »

Total Resources

2

Resource Type
Conference Paper

2

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

2

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Value Alignment Verification

Brown, Daniel S ; Schneider, Jordan ; Dragan, Anca ; Niekum, Scott ( July 2021 , International Conference on Machine Learning)
null (Ed.)
Full Text Available
Value Alignment Verification

Brown, Daniel ; Schneider, Jordan ; Dragan, Anca ; Niekum, Scott ( January 2021 , 38th International Conference on Machine Learning)

As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent’s performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human’s values. The goal is to construct a kind of “driver’s test” that a human can give to any agent which will verify value alignment via a minimal number of queries. We study alignment verification problems with both idealized humans that have an explicit reward function as well as problems where they have implicit values. We analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. Finally, we prove that there exist sufficient conditions such that we can verify exact and approximate alignment across an infinite set of test environments via a constant- query-complexity alignment test.
more » « less
Full Text Available