skip to main content

Title: Value Alignment Verification
As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent’s performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human’s values. The goal is to construct a kind of “driver’s test” that a human can give to any agent which will verify value alignment via a minimal number of queries. We study alignment verification problems with both idealized humans that have an explicit reward function as well as problems where they have implicit values. We analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. Finally, we prove that there exist sufficient conditions such that we can verify exact and approximate alignment across an infinite set of test environments via a constant- query-complexity alignment test.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
38th International Conference on Machine Learning
Sponsoring Org:
National Science Foundation
More Like this
  1. Policy gradient methods have become popular in multi-agent reinforcement learning, but they suffer from high variance due to the presence of environmental stochasticity and exploring agents (i.e., non-stationarity), which is potentially worsened by the difficulty in credit assignment. As a result, there is a need for a method that is not only capable of efficiently solving the above two problems but also robust enough to solve a variety of tasks. To this end, we propose a new multi-agent policy gradient method, called Robust Local Advantage (ROLA) Actor-Critic. ROLA allows each agent to learn an individual action-value function as a local critic as well as ameliorating environment non-stationarity via a novel centralized training approach based on a centralized critic. By using this local critic, each agent calculates a baseline to reduce variance on its policy gradient estimation, which results in an expected advantage action-value over other agents’ choices that implicitly improves credit assignment. We evaluate ROLA across diverse benchmarks and show its robustness and effectiveness over a number of state-of-the-art multi-agent policy gradient algorithms.
  2. With the development of sensing and communica- tion technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodolo- gies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with commu- nication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system’s total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system’s total reward to each agent under the proposed transferable utility game, such that communication- based cooperation among multi-agents increases the system’s total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literaturemore »algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.« less
  3. Safe operations of autonomous mobile robots in close proximity to humans, creates a need for enhanced trajectory tracking (with low tracking errors). Linear optimal control techniques such as Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) have been used successfully for low-speed applications while leveraging their model-based methodology with manageable computational demands. However, model and parameter uncertainties or other unmodeled nonlinearities may cause poor control actions and constraint violations. Nonlinear MPC has emerged as an alternate optimal-control approach but needs to overcome real-time deployment challenges (including fast sampling time, design complexity, and limited computational resources). In recent years, the optimal control-based deployments have benefitted enormously from the ability of Deep Neural Networks (DNNs) to serve as universal function approximators. This has led to deployments in a plethora of previously inaccessible applications – but many aspects of generalizability, benchmarking, and systematic verification and validation coupled with benchmarking have emerged. This paper presents a novel approach to fusing Deep Reinforcement Learning-based (DRL) longitudinal control with a traditional PID lateral controller for autonomous navigation. Our approach follows (i) Generation of an adequate fidelity simulation scenario via a Real2Sim approach; (ii) training a DRL agent within this framework; (iii) Testing the performance andmore »generalizability on alternate scenarios. We use an initial tuned set of the lateral PID controller gains for observing the vehicle response over a range of velocities. Then we use a DRL framework to generate policies for an optimal longitudinal controller that successfully complements the lateral PID to give the best tracking performance for the vehicle.« less
  4. We introduce a fast approximate stability analysis into an automated dry stacking procedure. Evaluating structural stability is essential for any type of construction, but especially challenging in techniques where building elements remain distinct and do not use fasteners or adhesives. Due to the irregular shape of construction materials, autonomous agents have restricted knowledge of contact geometry, which makes existing analysis tools difficult to deploy. In this paper, a geometric safety factor called kern is used to estimate how much the contact interface can shrink and the structure still be feasible, where feasibility can be checked efficiently using linear programming. We validate the stability measure by comparing the proposed methods with a fully simulated shaking test in 2D. We also improve existing heuristics-based planning by adding the proposed measure into the assembly process.
  5. Value alignment is a property of an intelligent agent indicating that it can only pursue goals and activities that are beneficial to humans. Traditional approaches to value alignment use imitation learning or preference learning to infer the values of humans by observing their behavior. We introduce a complementary technique in which a value-aligned prior is learned from naturally occurring stories which encode societal norms. Training data is sourced from the children's educational comic strip, Goofus & Gallant. In this work, we train multiple machine learning models to classify natural language descriptions of situations found in the comic strip as normative or non-normative by identifying if they align with the main characters' behavior. We also report the models' performance when transferring to two unrelated tasks with little to no additional training on the new task.