The “Naturalistic Free Recall” dataset provides transcribed verbal recollections of four spoken narratives collected from 229 participants. Each participant listened to two stories, varying in duration from approximately 8 to 13 minutes, recorded by different speakers. Subsequently, participants were tasked with verbally recalling the narrative content in as much detail as possible and in the correct order. The dataset includes high-fidelity, time-stamped text transcripts of both the original narratives and participants’ recollections. To validate the dataset, we apply a previously published automated method to score memory performance for narrative content. Using this approach, we extend effects traditionally observed in classic list-learning paradigms. The analysis of narrative contents and its verbal recollection presents unique challenges compared to controlled list-learning experiments. To facilitate the use of these rich data by the community, we offer an overview of recent computational methods that can be used to annotate and evaluate key properties of narratives and their recollections. Using advancements in machine learning and natural language processing, these methods can help the community understand the role of event structure, discourse properties, prediction error, high-level semantic features (e.g., idioms, humor), and more. All experimental materials, code, and data are publicly available to facilitate new advances in understanding human memory.
- Award ID(s):
- 1263699
- PAR ID:
- 10039007
- Date Published:
- Journal Name:
- Language documentation and conservation
- Volume:
- 10
- ISSN:
- 1934-5275
- Page Range / eLocation ID:
- 522
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
The study of language variation examines how language varies between and within different groups of speakers, shedding light on how we use language to construct identities and how social contexts affect language use. A common method is to identify instances of a certain linguistic feature - say, the zero copula construction - in a corpus, and analyze the feature’s distribution across speakers, topics, and other variables, to either gain a qualitative understanding of the feature’s function or systematically measure variation. In this paper, we explore the challenging task of automatic morphosyntactic feature detection in low-resource English varieties. We present a human-in-the-loop approach to generate and filter effective contrast sets via corpus-guided edits. We show that our approach improves feature detection for both Indian English and African American English, demonstrate how it can assist linguistic research, and release our fine-tuned models for use by other researchers.more » « less
-
null (Ed.)St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text-completion systems, interactive e-books, and language learning apps for use by the Yupik community.more » « less
-
null (Ed.)Abstract It is now a common practice to compare models of human language processing by comparing how well they predict behavioral and neural measures of processing difficulty, such as reading times, on corpora of rich naturalistic linguistic materials. However, many of these corpora, which are based on naturally-occurring text, do not contain many of the low-frequency syntactic constructions that are often required to distinguish between processing theories. Here we describe a new corpus consisting of English texts edited to contain many low-frequency syntactic constructions while still sounding fluent to native speakers. The corpus is annotated with hand-corrected Penn Treebank-style parse trees and includes self-paced reading time data and aligned audio recordings. We give an overview of the content of the corpus, review recent work using the corpus, and release the data.more » « less
-
Interactive narrative in games utilize a combination of dynamic adaptability and predefined story elements to support player agency and enhance player engagement. However, crafting such narratives requires significant manual authoring and coding effort to translate scripts to playable game levels. Advances in pretrained large language models (LLMs) have introduced the opportunity to procedurally generate narratives. This paper presents NarrativeGenie, a framework to generate narrative beats as a cohesive, partially ordered sequence of events that shapes narrative progressions from brief natural language instructions. By leveraging LLMs for reasoning and generation, NarrativeGenie, translates a designer’s story overview into a partially ordered event graph to enable player-driven narrative beat sequencing. Our findings indicate that NarrativeGenie can provide an easy and effective way for designers to generate an interactive game episode with narrative events that align with the intended story arc while at the same time granting players agency in their game experience. We extend our framework to dynamically direct the narrative flow by adapting real-time narrative interactions based on the current game state and player actions. Results demonstrate that NarrativeGenie generates narratives that are coherent and aligned with the designer’s vision.