skip to main content


Title: Learning Norms from Stories: A Prior for Value Aligned Agents
Value alignment is a property of an intelligent agent indicating that it can only pursue goals and activities that are beneficial to humans. Traditional approaches to value alignment use imitation learning or preference learning to infer the values of humans by observing their behavior. We introduce a complementary technique in which a value-aligned prior is learned from naturally occurring stories which encode societal norms. Training data is sourced from the children's educational comic strip, Goofus & Gallant. In this work, we train multiple machine learning models to classify natural language descriptions of situations found in the comic strip as normative or non-normative by identifying if they align with the main characters' behavior. We also report the models' performance when transferring to two unrelated tasks with little to no additional training on the new task.  more » « less
Award ID(s):
1849231
NSF-PAR ID:
10166962
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
AIES '20: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
Page Range / eLocation ID:
124 to 130
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A prerequisite for social coordination is bidirectional communication between teammates, each playing two roles simultaneously: as receptive listeners and expressive speakers. For robots working with humans in complex situations with multiple goals that differ in importance, failure to fulfill the expectation of either role could undermine group performance due to misalignment of values between humans and robots. Specifically, a robot needs to serve as an effective listener to infer human users’ intents from instructions and feedback and as an expressive speaker to explain its decision processes to users. Here, we investigate how to foster effective bidirectional human-robot communications in the context of value alignment—collaborative robots and users form an aligned understanding of the importance of possible task goals. We propose an explainable artificial intelligence (XAI) system in which a group of robots predicts users’ values by taking in situ feedback into consideration while communicating their decision processes to users through explanations. To learn from human feedback, our XAI system integrates a cooperative communication model for inferring human values associated with multiple desirable goals. To be interpretable to humans, the system simulates human mental dynamics and predicts optimal explanations using graphical models. We conducted psychological experiments to examine the core components of the proposed computational framework. Our results show that real-time human-robot mutual understanding in complex cooperative tasks is achievable with a learning model based on bidirectional communication. We believe that this interaction framework can shed light on bidirectional value alignment in communicative XAI systems and, more broadly, in future human-machine teaming systems. 
    more » « less
  2. As humans interact with autonomous agents to perform increasingly complicated, potentially risky tasks, it is important to be able to efficiently evaluate an agent’s performance and correctness. In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human’s values. The goal is to construct a kind of “driver’s test” that a human can give to any agent which will verify value alignment via a minimal number of queries. We study alignment verification problems with both idealized humans that have an explicit reward function as well as problems where they have implicit values. We analyze verification of exact value alignment for rational agents and propose and analyze heuristic and approximate value alignment verification tests in a wide range of gridworlds and a continuous autonomous driving domain. Finally, we prove that there exist sufficient conditions such that we can verify exact and approximate alignment across an infinite set of test environments via a constant- query-complexity alignment test. 
    more » « less
  3. Growing concerns about the AI alignment problem have emerged in recent years, with previous work focusing mostly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing on either a single agent or on humanity as a singular unit. However, the field as a whole lacks a systematic understanding of how to specify, describe and analyze misalignment among entities, which may include individual humans, AI agents, and complex compositional entities such as corporations, nation-states, and so forth. Prior work on controversy in computational social science offers a mathematical model of contention among populations (of humans). In this paper, we adapt this contention model to the alignment problem, and show how viewing misalignment can vary depending on the population of agents (human or otherwise) being observed as well as the domain or "problem area" in question. Our model departs from value specification approaches and focuses instead on the morass of complex, interlocking, sometimes contradictory goals that agents may have in practice. We discuss the implications of our model and leave more thorough verification for future work. 
    more » « less
  4. Recent interest in codifying fairness in Automated Decision Systems (ADS) has resulted in a wide range of formulations of what it means for an algorithm to be “fair.” Most of these propositions are inspired by, but inadequately grounded in, scholarship from political philosophy. This comic aims to correct that deficit. We begin by setting up a working definition of an 'Automated Decision System' (ADS) and explaining 'bias' in outputs of an ADS. We then critically evaluate different definitions of fairness as Equality of Opportunity (EOP) by contrasting their conception in political philosophy (such as Rawls’s fair EOP and formal EOP) with the proposed codification in Fair-ML (such as statistical parity, equality of odds and accuracy) to provide a clearer lens with which to view existing results and to identify future research directions. We use this framing to reinterpret the impossibility results as the incompatibility between different EOP doctrines and demonstrate how political philosophy can provide normative guidance as to which notion of fairness is applicable in which context. We conclude by highlighting justice considerations that the fair-ML literature currently overlooks or underemphasizes, such as Rawls's broader theory of justice, which supplements his EOP principle with a principle guaranteeing equal rights and liberties to all citizens in a free and democratic society. 
    more » « less
  5. Abstract

    Undergraduate STEM lecture courses enroll hundreds who must master declarative, conceptual, and applied learning objectives. To support them, instructors have turned to active learning designs that require students to engage inself-regulated learning(SRL). Undergraduates struggle with SRL, and universities provide courses, workshops, and digital training to scaffold SRL skill development and enactment. We examined two theory-aligned designs of digital skill trainings that scaffold SRL and how students’ demonstration of metacognitive knowledge of learning skills predicted exam performance in biology courses where training took place. In Study 1, students’ (n = 49) responses to training activities were scored for quality and summed by training topic and level of understanding. Behavioral and environmental regulation knowledge predicted midterm and final exam grades; knowledge of SRL processes did not. Declarative and conceptual levels of skill-mastery predicted exam performance; application-level knowledge did not. When modeled by topic at each level of understanding, declarative knowledge of behavioral and environmental regulation and conceptual knowledge of cognitive strategies predicted final exam performance. In Study 2 (n = 62), knowledge demonstrated during a redesigned video-based multimedia version of behavioral and environmental regulation again predicted biology exam performance. Across studies, performance on training activities designed in alignment with skill-training models predicted course performances and predictions were sustained in a redesign prioritizing learning efficiency. Training learners’ SRL skills –and specifically cognitive strategies and environmental regulation– benefited their later biology course performances across studies, which demonstrate the value of providing brief, digital activities to develop learning skills. Ongoing refinement to materials designed to develop metacognitive processing and learners’ ability to apply skills in new contexts can increase benefits.

     
    more » « less