skip to main content


Title: Inducing Stereotypical Character Roles from Plot Structure
Stereotypical character roles-also known as archetypes or dramatis personae-play an important function in narratives: they facilitate efficient communication with bundles of default characteristics and associations and ease understanding of those characters’ roles in the overall narrative. We present a fully unsupervised k-means clustering approach for learning stereotypical roles given only structural plot information. We demonstrate the technique on Vladimir Propp’s structural theory of Russian folktales (captured in the extended ProppLearner corpus, with 46 tales), showing that our approach can induce six out of seven of Propp’s dramatis personae with F1 measures of up to 0.70 (0.58 average), with an additional category for minor characters. We have explored various feature sets and variations of a cluster evaluation method. The best-performing feature set comprises plot functions, unigrams, tf-idf weights, and embeddings over coreference chain heads. Roles that are mentioned more often (Hero, Villain), or have clearly distinct plot patterns (Princess) are more strongly differentiated than less frequent or distinct roles (Dispatcher, Helper, Donor). Detailed error analysis suggests that the quality of the coreference chain and plot functions annotations are critical for this task. We provide all our data and code for reproducibility.  more » « less
Award ID(s):
1749917
NSF-PAR ID:
10321992
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 25th Conference on Empirical Methods in Natural Language Process (EMNLP 2021)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Fanfiction presents an opportunity as a data source for research in NLP, education, and social science. However, answering specific research questions with this data is difficult, since fanfiction contains more diverse writing styles than formal fiction. We present a text processing pipeline for fanfiction, with a fo- cus on identifying text associated with characters. The pipeline includes modules for character identification and coreference, as well as the attribution of quotes and narration to those characters. Additionally, the pipeline contains a novel approach to character coreference that uses knowledge from quote attribution to resolve pronouns within quotes. For each module, we evaluate the effectiveness of various approaches on 10 annotated fanfiction stories. This pipeline outperforms tools developed for formal fiction on the tasks of character coreference and quote attribution. 
    more » « less
  2. null (Ed.)
    Animacy is the characteristic of a referent beingable to independently carry out actions in a storyworld (e.g., movement, communication). It is anecessary property of characters in stories, and sodetecting animacy is an important step in automaticstory understanding; it is also potentially useful formany other natural language processing tasks suchas word sense disambiguation, coreference resolu-tion, character identification, and semantic role la-beling. Recent work by Jahanet al.[2018]demon-strated a new approach to detecting animacy whereanimacy is considered a direct property of corefer-ence chains (and referring expressions) rather thanwords. In Jahanet al., they combined hand-builtrules and machine learning (ML) to identify the an-imacy of referring expressions and used majorityvoting to assign the animacy of coreference chains,and reported high performance of up to 0.90F1. Inthis short report we verify that the approach gener-alizes to two different corpora (OntoNotes and theCorpus of English Novels) and we confirmed thatthe hybrid model performs best, with the rule-basedmodel in second place. Our tests apply the animacyclassifier to almost twice as much data as Jahanetal.’s initial study. Our results also strongly suggest,as would be expected, the dependence of the mod-els on coreference chain quality. We release ourdata and code to enable reproducibility. 
    more » « less
  3. Animacy is a necessary property for a referent to be an agent, and thus animacy detection is useful for a variety of natural language processing tasks, including word sense disambiguation, co-reference resolution, semantic role labeling, and others. Prior work treated animacy as a word-level property, and has developed statistical classifiers to classify words as either animate or inanimate. We discuss why this approach to the problem is ill-posed, and present a new approach based on classifying the animacy of co-reference chains. We show that simple voting approaches to inferring the animacy of a chain from its constituent words perform relatively poorly, and then present a hybrid system merging supervised machine learning (ML) and a small number of hand-built rules to compute the animacy of referring expressions and co-reference chains. This method achieves state of the art performance. The supervised ML component leverages features such as word embeddings over referring expressions, parts of speech, and grammatical and semantic roles. The rules take into consideration parts of speech and the hypernymy structure encoded in WordNet. The system achieves an F1 of 0.88 for classifying the animacy of referring expressions, which is comparable to state of the art results for classifying the animacy of words, and achieves an F1 of 0.75 for classifying the animacy of coreference chains themselves. We release our training and test dataset, which includes 142 texts (all narratives) comprising 156,154 words, 34,698 referring expressions, and 10,941 co-reference chains. We test the method on a subset of the OntoNotes dataset, showing using manual sampling that animacy classification is 90% +/- 2% accurate for coreference chains, and 92% +/- 1% for referring expressions. The data also contains 46 folktales, which present an interesting challenge because they often involve characters who are members of traditionally inanimate classes (e.g., stoves that walk, trees that talk). We show that our system is able to detect the animacy of these unusual referents with an F1 of 0.95. 
    more » « less
  4. Interpersonal violence (IPV) is a prominent sociological problem that affects people of all demographic backgrounds. By analyzing how readers interpret, perceive, and react to experiences narrated in social media posts, we explore an understudied source for discourse about abuse. We asked readers to annotate Reddit posts about relationships with vs. without IPV for stakeholder roles and emotion, while measuring their galvanic skin response (GSR), pulse, and facial expression. We map annotations to coreference resolution output to obtain a labeled coreference chain for stakeholders in texts, and apply automated semantic role labeling for analyzing IPV discourse. Findings provide insights into how readers process roles and emotion in narratives. For example, abusers tend to be linked with violent actions and certain affect states. We train classifiers to predict stakeholder categories of coreference chains. We also find that subjects' GSR noticeably changed for IPV texts, suggesting that co-collected measurement-based data about annotators can be used to support text annotation. 
    more » « less
  5. Coreference choices are influenced by multiple factors, including information structural categories such as topic and focus. These information structural categories can be indicated by intonation, yet few studies have investigated how intonation affects subsequent choices for coreference. Using a story continuation experiment with aurally presented stimuli, we show that the location of contrastive focus in Mainstream American English significantly affects the preferred referent for the subject of the next sentence in a short discourse. 
    more » « less