skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on February 1, 2026

Title: Goals as reward-producing programs
People are remarkably capable of generating their own goals, beginning with child’s play and continuing into adulthood. Despite considerable empirical and computational work on goals and goal-oriented behaviour, models are still far from capturing the richness of everyday human goals. Here we bridge this gap by collecting a dataset of human-generated playful goals (in the form of scorable, single-player games), modelling them as reward-producing programs and generating novel human-like goals through program synthesis. Reward-producing programs capture the rich semantics of goals through symbolic operations that compose, add temporal constraints and allow program execution on behavioural traces to evaluate progress. To build a generative model of goals, we learn a fitness function over the infinite set of possible goal programs and sample novel goals with a quality-diversity algorithm. Human evaluators found that model-generated goals, when sampled from partitions of program space occupied by human examples, were indistinguishable from human-created games. We also discovered that our model’s internal fitness scores predict games that are evaluated as more fun to play and more human-like.  more » « less
Award ID(s):
2121102
PAR ID:
10576869
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Nature Machine Intelligence
Date Published:
Journal Name:
Nature Machine Intelligence
Volume:
7
Issue:
2
ISSN:
2522-5839
Page Range / eLocation ID:
205 to 220
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summerfield, Christopher (Ed.)
    When observing the outcome of a choice, people are sensitive to the choice’s context, such that the experienced value of an option depends on the alternatives: getting $1 when the possibilities were 0 or 1 feels much better than when the possibilities were 1 or 10. Context-sensitive valuation has been documented within reinforcement learning (RL) tasks, in which values are learned from experience through trial and error. Range adaptation, wherein options are rescaled according to the range of values yielded by available options, has been proposed to account for this phenomenon. However, we propose that other mechanisms—reflecting a different theoretical viewpoint—may also explain this phenomenon. Specifically, we theorize that internally defined goals play a crucial role in shaping the subjective value attributed to any given option. Motivated by this theory, we develop a new “intrinsically enhanced” RL model, which combines extrinsically provided rewards with internally generated signals of goal achievement as a teaching signal. Across 7 different studies (including previously published data sets as well as a novel, preregistered experiment with replication and control studies), we show that the intrinsically enhanced model can explain context-sensitive valuation as well as, or better than, range adaptation. Our findings indicate a more prominent role of intrinsic, goal-dependent rewards than previously recognized within formal models of human RL. By integrating internally generated signals of reward, standard RL theories should better account for human behavior, including context-sensitive valuation and beyond. 
    more » « less
  2. We propose the problem of tutorial generation for games, i.e. to generate tutorials which can teach players to play games, as an AI problem. This problem can be approached in several ways, including generating natural language descriptions of game rules, generating instructive game levels, and generating demonstrations of how to play a game using agents that play in a human-like manner. We further argue that the General Video Game AI framework provides a useful testbed for addressing this problem. 
    more » « less
  3. Devising models that reliably recognize player goals is a key challenge in creating player-adaptive games. Player goal recognition is the task of automatically recognizing the intent of a player from a sequence of observed player actions in a game environment. In open-world digital games, players often undertake suboptimal and varied sequences of actions to achieve goals, and the high degree of freedom afforded to players makes it challenging to identify sequential patterns that lead toward specific goals. To address these issues, we present a player goal recognition framework that utilizes a fine-tuned T5 language model, which incorporates our novel attention mechanism called Temporal Contrary Attention (TCA). The T5 language model enables the framework to exploit correlations between observations through non-sequential self-attention within input sequences, while TCA enables the framework to learn to eliminate goal hypotheses by considering counterevidence within a temporal window. We evaluate our approach using game trace data collected from 144 players' interactions with an open-world educational game. Specifically, we investigate the predictive capacity of our approach to recognize player goals as well as player plans represented as abstract actions. Results show that our approach outperforms non-linguistic machine learning approaches as well as T5 without TCA. We discuss the implications of these findings for the design and development of player goal recognition models to create player-adaptive games. 
    more » « less
  4. Adaptation Based Programming (ABP) allows programmers to employ "choice points" at program locations where they are uncertain about how to best code the program logic. Reinforcement learning (RL) is then used to automatically learn to make choice-point decisions to optimize the reward achieved by the program. In this paper, we consider a new approach to explaining the learned decisions of adaptive programs. The key idea is to include simple program annotations that define multiple semantically meaningful reward types, which compose to define the overall reward signal used for learning. Using these reward types we define the notion of reward difference explanations (RDXs), which aim to explain why at a choice point an alternative A was selected over another alternative B An RDX gives the difference in the predicted future reward of each type when selecting A versus B and then continuing to run the adaptive program. Significant differences can provide insight into why A was or was not preferred to B. We describe a SARSA-style learning algorithm for learning to optimize the choices at each choice point, while also learning side information for producing RDXs. We demonstrate this explanation approach through a case study in a synthetic domain, which shows the general promise of the approach and highlights future research questions. 
    more » « less
  5. Madkour, Abdelrahman; Otto, Jasmine; Ferreira, Lucas N; Johnson-Bey, Shi (Ed.)
    Player goals in games are often framed in terms of achieving something in the game world, but this framing can fail to capture goals centered on the player’s own mental model, such as seeking the answers to questions about the game world. We use a least-commitment model of interactive narrative to characterize these knowledge goals and the problem of knowledge goal recognition. As a first attempt to solve the knowledge goal recognition problem, we adapt a classical goal recognition paradigm, but in our empirical evaluation the approach suffers from a high rate of incorrectly rejecting a synthetic player’s true goals; we discuss how handling of player goals could be made more robust in practice. 
    more » « less