skip to main content


Title: Grounding Characters and Places in Narrative Text
Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relationship category for every character and location co-mention within a window of text, taking into consideration linguistic context, narrative tense, and temporal scope. To this end, we annotate spatial relationships in approximately 2500 book excerpts and train a model using contextual embeddings as features to predict these relationships. When applied to a set of books, this model allows us to test several hypotheses on mobility and domestic space, revealing that protagonists are more mobile than non-central characters and that women as characters tend to occupy more interior space than men. Overall, our work is the first step towards joint modeling and analysis of characters and places in narrative text.  more » « less
Award ID(s):
1942591
NSF-PAR ID:
10433598
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The representation of mobility in literary narratives has important implications for the cultural understanding of human movement and migration. In this paper, we introduce novel methods for measuring the physical mobility of literary characters through narrative space and time. We capture mobility through geographically defined space, as well as through generic locations such as homes, driveways, and forests. Using a dataset of over 13,000 books published in English since 1789, we observe significant "small world" effects in fictional narratives. Specifically, we find that fictional characters cover far less distance than their non-fictional counterparts; the pathways covered by fictional characters are highly formulaic and limited from a global perspective; and fiction exhibits a distinctive semantic investment in domestic and private places. Surprisingly, we do not find that characters' ascribed gender has a statistically significant effect on distance traveled, but it does influence the semantics of domesticity. 
    more » « less
  2. Sense of place is a critical concept underlying the meanings attached to locations and locales in geography and related fields. This concept is often ambiguous and complex when presented in narrative text and challenging to represent and analyse at scale. Mapping a sense of place in this regard requires more than finding geographical coordinates or drawing polygons around toponyms. Our paper develops the concept of a spatio-textual region (STR), a method for identifying platial clusters embedded in spatial narrative texts and explores the potential for mapping the results. We demonstrate the method on an 1857 publication by Thomas Nelson & Sons, a traveller's guide to the Lake District in England. We envision that this method could be employed at scale for generating novel representations of the sense of place embedded in tourist literature, personal journeys, and other spatial narratives.

     
    more » « less
  3. null (Ed.)
    We present the task of modeling information propagation in literature, in which we seek to identify pieces of information passing from character A to character B to character C, only given a description of their activity in text. We describe a new pipeline for measuring information propagation in this domain and publish a new dataset for speaker attribution, enabling the evaluation of an important component of this pipeline on a wider range of literary texts than previously studied. Using this pipeline, we analyze the dynamics of information propagation in over 5,000 works of English fiction, finding that information flows through characters that fill structural holes connecting different communities, and that characters who are women are depicted as filling this role much more frequently than characters who are men. 
    more » « less
  4. null (Ed.)
    One of the most fundamental elements of narrative is character: if we are to understand a narrative, we must be able to identify the characters of that narrative. Therefore, character identification is a critical task in narrative natural language understanding. Most prior work has lacked a narratologically grounded definition of character, instead relying on simplified or implicit definitions that do not capture essential distinctions between characters and other referents in narratives. In prior work we proposed a preliminary definition of character that was based in clear narratological principles: a character is an animate entity that is important to the plot. Here we flesh out this concept, demonstrate that it can be reliably annotated (0.78 Cohen’s κ), and provide annotations of 170 narrative texts, drawn from 3 different corpora, containing 1,347 character co-reference chains and 21,999 non-character chains that include 3,937 animate chains. Furthermore, we have shown that a supervised classifier using a simple set of easily computable features can effectively identify these characters (overall F1 of 0.90). A detailed error analysis shows that character identification is first and foremost affected by co-reference quality, and further, that the shorter a chain is the harder it is to effectively identify as a character. We release our code and data for the benefit of other researchers 
    more » « less
  5. Smith, Stacey (Ed.)
    Abstract The correlation between two characters is often interpreted as evidence that there exists a significant and biologically important relationship between them. However, Maddison and FitzJohn (in The unsolved challenge to phylogenetic correlation tests for categorical characters. Syst. Biol. 2015;64:127–136) recently pointed out that evidence of correlated evolution between two categorical characters is often spurious, particularly, when the dependent relationship stems from a single replicate deep in time. Here we will show that there may, in fact, be a statistical solution to the problem posed by Maddison and FitzJohn naturally embedded within the expanded model space afforded by the hidden Markov model (HMM) framework. We demonstrate that the problem of single unreplicated evolutionary events manifests itself as rate heterogeneity within our models and that this is the source of the false correlation. Therefore, we argue that this problem is better understood as model misspecification rather than a failure of comparative methods to account for phylogenetic pseudoreplication. We utilize HMMs to develop a multirate independent model which, when implemented, drastically reduces support for correlation. The problem itself extends beyond categorical character evolution, but we believe that the practical solution presented here may lend itself to future extensions in other areas of comparative biology. [Macroevolution; model adequacy; phylogenetic comparative methods; rate heterogeneity]. 
    more » « less