skip to main content

This content will become publicly available on March 5, 2025

Title: PASTA: A Dataset for Modeling Participant States in Narratives
The events in a narrative are understood as a coherent whole via the underlying states of their participants. Often, these participant states are not explicitly mentioned, instead left to be inferred by the reader. A model that understands narratives should likewise infer these implicit states, and even reason about the impact of changes to these states on the narrative. To facilitate this goal, we introduce a new crowdsourced English-language, Participant States dataset, PASTA. This dataset contains inferable participant states; a counterfactual perturbation to each state; and the changes to the story that would be necessary if the counterfactual were true. We introduce three state-based reasoning tasks that test for the ability to infer when a state is entailed by a story, to revise a story conditioned on a counterfactual state, and to explain the most likely state change given a revised story. Experiments show that today’s LLMs can reason about states to some degree, but there is large room for improvement, especially in problems requiring access and ability to reason with diverse types of knowledge (e.g. physical, numerical, factual).  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Transactions of the ACL (TACL)
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As humans, we can modify our assumptions about a scene by imagining alternative objects or concepts in our minds. For example, we can easily anticipate the implications of the sun being overcast by rain clouds (e.g., the street will get wet) and accordingly prepare for that. In this paper, we introduce a new task/dataset called Commonsense Reasoning for Counterfactual Scene Imagination (COSIM) which is designed to evaluate the ability of AI systems to reason about scene change imagination. In this task/dataset, models are given an image and an initial question-response pair about the image. Next, a counterfactual imagined scene change (in textual form) is applied, and the model has to predict the new response to the initial question based on this scene change. We collect 3.5K high-quality and challenging data instances, with each instance consisting of an image, a commonsense question with a response, a description of a counterfactual change, a new response to the question, and three distractor responses. Our dataset contains various complex scene change types (such as object addition/removal/state change, event description, environment change, etc.) that require models to imagine many different scenarios and reason about the changed scenes. We present a baseline model based on a vision-language Transformer (i.e., LXMERT) and ablation studies. Through human evaluation, we demonstrate a large human-model performance gap, suggesting room for promising future work on this challenging counterfactual, scene imagination task. 
    more » « less
  2. Narrative planners generate sequences of actions that represent story plots given a story domain model. This is a useful way to create branching stories for interactive narrative systems that maintain logical consistency across multiple storylines with different content. There is a need for story comparison techniques that can enable systems like experience managers and domain authoring tools to reason about similarities and differences between multiple stories or branches. We present an algorithm for summarizing narrative plans as numeric vectors based on a cognitive model of human story perception. The vectors encode important story information and can be compared using standard distance functions to quantify the overall semantic difference between two stories. We show that this distance metric is highly accurate based on human annotations of story similarity, and compare it to several alternative approaches. We also explore variations of our method in an attempt to broaden its applicability to other types of story systems. 
    more » « less
  3. In this methods paper, the development and utility of composite narratives will be explored. Composite narratives, which involve combining aspects of multiple interviews into a single narrative, are a relatively modern methodology used in the qualitative research literature for several purposes: to do justice to complex accounts while maintaining participant anonymity, summarize data in a more engaging personal form and retain the human face of the data, illustrate specific aspects of the research findings, enhance the transferability of research findings by invoking empathy, illuminate collective experiences, and enhance research impact by providing findings in a manner more accessible to those outside of academia. Composite narratives leverage the power of storytelling, which has shown to be effective in studies of neurology and psychology; i.e., since humans often think and process information in narrative structures, the information conveyed in story form can be imprinted more easily on readers’ minds or existing schema. Engineering education researchers have increasingly begun using narrative research methods. Recently, researchers have begun exploring composite narratives as an approach to enable more complex and nuanced understandings of engineering education while mitigating potential issues around the confidentiality of participants. Because this is a relatively new methodology in higher education more broadly and in engineering education specifically, more examples of how to construct and utilize composite narratives in both research and practice are needed. This paper will share how we created a composite narrative from interviews we collected for our work so that others can adapt this methodology for their research projects. The paper will also discuss ways we modified and enhanced these narratives to connect research to practice and impact engineering students. This approach involved developing probing questions to stimulate thinking, learning, and discussion in academic and industrial educational settings. We developed the composite narratives featured in this paper from fifteen semi-structured critical incident interviews with engineering managers about their perceptions of adaptability. The critical incidents shared were combined to develop seven composite narratives reflecting real-life situations to which engineers must adapt in the workplace. These scenarios, grounded in the data, were taken directly to the engineering classroom for discussion with students on how they would respond and adapt to the presented story. In this paper, we detail our process of creating one composite narrative from the broader study and its associated probing questions for research dissemination in educational settings. We present this detailed account of how one composite narrative was constructed to demonstrate the quality and trustworthiness of the composite narrative methodology and assist in its replication by other scholars. Further, we discuss the benefits and limitations of this methodology, highlighting the parts of the data brought into focus using this method and how that contrasts with an inductive-deductive approach to qualitative coding also taken in this research project. 
    more » « less
  4. Abstract Research Highlights

    Older children were more likely to endorse agents whochooseto share over those who do not have a choice.

    Children who were prompted to generate more counterfactuals were more likely to allocate resources to characters with choice.

    Children who generated selfish counterfactuals more positively evaluated agents with choice.

    Comparable to theories suggesting children punish willful transgressors more than accidental transgressors, we propose children also consider free will when making positive moral evaluations.

    more » « less
  5. Background

    As mobile health (mHealth) studies become increasingly productive owing to the advancements in wearable and mobile sensor technology, our ability to monitor and model human behavior will be constrained by participant receptivity. Many health constructs are dependent on subjective responses, and without such responses, researchers are left with little to no ground truth to accompany our ever-growing biobehavioral data. This issue can significantly impact the quality of a study, particularly for populations known to exhibit lower compliance rates. To address this challenge, researchers have proposed innovative approaches that use machine learning (ML) and sensor data to modify the timing and delivery of surveys. However, an overarching concern is the potential introduction of biases or unintended influences on participants’ responses when implementing new survey delivery methods.


    This study aims to demonstrate the potential impact of an ML-based ecological momentary assessment (EMA) delivery system (using receptivity as the predictor variable) on the participants’ reported emotional state. We examine the factors that affect participants’ receptivity to EMAs in a 10-day wearable and EMA–based emotional state–sensing mHealth study. We study the physiological relationships indicative of receptivity and affect while also analyzing the interaction between the 2 constructs.


    We collected data from 45 healthy participants wearing 2 devices measuring electrodermal activity, accelerometer, electrocardiography, and skin temperature while answering 10 EMAs daily, containing questions about perceived mood. Owing to the nature of our constructs, we can only obtain ground truth measures for both affect and receptivity during responses. Therefore, we used unsupervised and supervised ML methods to infer affect when a participant did not respond. Our unsupervised method used k-means clustering to determine the relationship between physiology and receptivity and then inferred the emotional state during nonresponses. For the supervised learning method, we primarily used random forest and neural networks to predict the affect of unlabeled data points as well as receptivity.


    Our findings showed that using a receptivity model to trigger EMAs decreased the reported negative affect by >3 points or 0.29 SDs in our self-reported affect measure, scored between 13 and 91. The findings also showed a bimodal distribution of our predicted affect during nonresponses. This indicates that this system initiates EMAs more commonly during states of higher positive emotions.


    Our results showed a clear relationship between affect and receptivity. This relationship can affect the efficacy of an mHealth study, particularly those that use an ML algorithm to trigger EMAs. Therefore, we propose that future work should focus on a smart trigger that promotes EMA receptivity without influencing affect during sampled time points.

    more » « less