skip to main content


Title: A Straightforward Approach to Narratologically Grounded Character Identification
One of the most fundamental elements of narrative is character: if we are to understand a narrative, we must be able to identify the characters of that narrative. Therefore, character identification is a critical task in narrative natural language understanding. Most prior work has lacked a narratologically grounded definition of character, instead relying on simplified or implicit definitions that do not capture essential distinctions between characters and other referents in narratives. In prior work we proposed a preliminary definition of character that was based in clear narratological principles: a character is an animate entity that is important to the plot. Here we flesh out this concept, demonstrate that it can be reliably annotated (0.78 Cohen’s κ), and provide annotations of 170 narrative texts, drawn from 3 different corpora, containing 1,347 character co-reference chains and 21,999 non-character chains that include 3,937 animate chains. Furthermore, we have shown that a supervised classifier using a simple set of easily computable features can effectively identify these characters (overall F1 of 0.90). A detailed error analysis shows that character identification is first and foremost affected by co-reference quality, and further, that the shorter a chain is the harder it is to effectively identify as a character. We release our code and data for the benefit of other researchers  more » « less
Award ID(s):
1749917
NSF-PAR ID:
10220130
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
28th International Conference on Computational Linguistics (COLING 2020)
Page Range / eLocation ID:
6089 to 6100
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Characters are a key element of narrative and so character identification plays an important role in automatic narrative understanding. Unfortunately, most prior work that incorporates character identification is not built upon a clear, theoretically grounded concept of character. They either take character identification for granted (e.g., using simple heuristics on referring expressions), or rely on simplified definitions that do not capture important distinctions between characters and other referents in the story. Prior approaches have also been rather complicated, relying, for example, on predefined case bases or ontologies. In this paper we propose a narratologically grounded definition of character for discussion at the workshop, and also demonstrate a preliminary yet straightforward supervised machine learning model with a small set of features that performs well on two corpora. The most important of the two corpora is a set of 46 Russian folktales, on which the model achieves an F1 of 0.81. Error analysis suggests that features relevant to the plot will be necessary for further improvements in performance. 
    more » « less
  2. Animacy is a necessary property for a referent to be an agent, and thus animacy detection is useful for a variety of natural language processing tasks, including word sense disambiguation, co-reference resolution, semantic role labeling, and others. Prior work treated animacy as a word-level property, and has developed statistical classifiers to classify words as either animate or inanimate. We discuss why this approach to the problem is ill-posed, and present a new approach based on classifying the animacy of co-reference chains. We show that simple voting approaches to inferring the animacy of a chain from its constituent words perform relatively poorly, and then present a hybrid system merging supervised machine learning (ML) and a small number of hand-built rules to compute the animacy of referring expressions and co-reference chains. This method achieves state of the art performance. The supervised ML component leverages features such as word embeddings over referring expressions, parts of speech, and grammatical and semantic roles. The rules take into consideration parts of speech and the hypernymy structure encoded in WordNet. The system achieves an F1 of 0.88 for classifying the animacy of referring expressions, which is comparable to state of the art results for classifying the animacy of words, and achieves an F1 of 0.75 for classifying the animacy of coreference chains themselves. We release our training and test dataset, which includes 142 texts (all narratives) comprising 156,154 words, 34,698 referring expressions, and 10,941 co-reference chains. We test the method on a subset of the OntoNotes dataset, showing using manual sampling that animacy classification is 90% +/- 2% accurate for coreference chains, and 92% +/- 1% for referring expressions. The data also contains 46 folktales, which present an interesting challenge because they often involve characters who are members of traditionally inanimate classes (e.g., stoves that walk, trees that talk). We show that our system is able to detect the animacy of these unusual referents with an F1 of 0.95. 
    more » « less
  3. Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relationship category for every character and location co-mention within a window of text, taking into consideration linguistic context, narrative tense, and temporal scope. To this end, we annotate spatial relationships in approximately 2500 book excerpts and train a model using contextual embeddings as features to predict these relationships. When applied to a set of books, this model allows us to test several hypotheses on mobility and domestic space, revealing that protagonists are more mobile than non-central characters and that women as characters tend to occupy more interior space than men. Overall, our work is the first step towards joint modeling and analysis of characters and places in narrative text. 
    more » « less
  4. null (Ed.)
    Animacy is the characteristic of a referent beingable to independently carry out actions in a storyworld (e.g., movement, communication). It is anecessary property of characters in stories, and sodetecting animacy is an important step in automaticstory understanding; it is also potentially useful formany other natural language processing tasks suchas word sense disambiguation, coreference resolu-tion, character identification, and semantic role la-beling. Recent work by Jahanet al.[2018]demon-strated a new approach to detecting animacy whereanimacy is considered a direct property of corefer-ence chains (and referring expressions) rather thanwords. In Jahanet al., they combined hand-builtrules and machine learning (ML) to identify the an-imacy of referring expressions and used majorityvoting to assign the animacy of coreference chains,and reported high performance of up to 0.90F1. Inthis short report we verify that the approach gener-alizes to two different corpora (OntoNotes and theCorpus of English Novels) and we confirmed thatthe hybrid model performs best, with the rule-basedmodel in second place. Our tests apply the animacyclassifier to almost twice as much data as Jahanetal.’s initial study. Our results also strongly suggest,as would be expected, the dependence of the mod-els on coreference chain quality. We release ourdata and code to enable reproducibility. 
    more » « less
  5. Intelligent interactive narrative systems coordinate a cast of non-player characters to make the overall story experience meaningful for the player. Narrative generation involves a tradeoff between plot-structure requirements and quality of character behavior, as well as computational efficiency. We study this tradeoff using the example of benchmark problems for narrative planning algorithms. A typical narrative planning problem calls for a sequence of actions that leads to an overall plot goal being met, while also requiring each action to respect constraints that create the appearance of character autonomy. We consider simplified solution definitions that enforce only plot requirements or only character requirements, and we measure how often each of these definitions leads to a solution that happens to meet both types of requirements—i.e., the density with which narrative plans occur among plot- or character-requirement-satisfying sequences. We then investigate whether solution densities can guide the selection of narrative planning algorithms. We compare the performance of two search strategies: one that satisfies plot requirements first and checks character requirements afterward, and one that continuously verifies character requirements. Our results show that comparing solution densities does not by itself predict which of these search strategies will be more efficient in terms of search nodes visited, suggesting that other important factors exist. We discuss what some of these factors could be. Our work opens further investigation into characterizing narrative planning algorithms and how they interact with specific domains. The results also highlight the diversity and difficulty of solving narrative planning problems. 
    more » « less