skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Yarlott, W. Victor"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    Animacy is the characteristic of a referent beingable to independently carry out actions in a storyworld (e.g., movement, communication). It is anecessary property of characters in stories, and sodetecting animacy is an important step in automaticstory understanding; it is also potentially useful formany other natural language processing tasks suchas word sense disambiguation, coreference resolu-tion, character identification, and semantic role la-beling. Recent work by Jahanet al.[2018]demon-strated a new approach to detecting animacy whereanimacy is considered a direct property of corefer-ence chains (and referring expressions) rather thanwords. In Jahanet al., they combined hand-builtrules and machine learning (ML) to identify the an-imacy of referring expressions and used majorityvoting to assign the animacy of coreference chains,and reported high performance of up to 0.90F1. Inthis short report we verify that the approach gener-alizes to two different corpora (OntoNotes and theCorpus of English Novels) and we confirmed thatthe hybrid model performs best, with the rule-basedmodel in second place. Our tests apply the animacyclassifier to almost twice as much data as Jahanetal.’s initial study. Our results also strongly suggest,as would be expected, the dependence of the mod-els on coreference chain quality. We release ourdata and code to enable reproducibility. 
    more » « less
  2. null (Ed.)
    One of the most fundamental elements of narrative is character: if we are to understand a narrative, we must be able to identify the characters of that narrative. Therefore, character identification is a critical task in narrative natural language understanding. Most prior work has lacked a narratologically grounded definition of character, instead relying on simplified or implicit definitions that do not capture essential distinctions between characters and other referents in narratives. In prior work we proposed a preliminary definition of character that was based in clear narratological principles: a character is an animate entity that is important to the plot. Here we flesh out this concept, demonstrate that it can be reliably annotated (0.78 Cohen’s κ), and provide annotations of 170 narrative texts, drawn from 3 different corpora, containing 1,347 character co-reference chains and 21,999 non-character chains that include 3,937 animate chains. Furthermore, we have shown that a supervised classifier using a simple set of easily computable features can effectively identify these characters (overall F1 of 0.90). A detailed error analysis shows that character identification is first and foremost affected by co-reference quality, and further, that the shorter a chain is the harder it is to effectively identify as a character. We release our code and data for the benefit of other researchers 
    more » « less
  3. null (Ed.)
    Identifying the discourse structure of documents is an important task in understanding written text. Building on prior work, we demonstrate an improved approach to automatically identifying the discourse function of paragraphs in news articles. We start with the hierarchical theory of news discourse developed by van Dijk (1988) which proposes how paragraphs function within news articles. This discourse information is a level intermediate between phrase- or sentence-sized discourse segments and document genre, characterizing how individual paragraphs convey information about the events in the storyline of the article. Specifically, the theory categorizes the relationships between narrated events and (1) the overall storyline (such as Main Events, Background, or Consequences) as well as (2) commentary (such as Verbal Reactions and Evaluations). We trained and tested a linear chain conditional random field (CRF) with new features to model van Dijk’s labels and compared it against several machine learning models presented in previous work. Our model significantly outperformed all baselines and prior approaches, achieving an average of 0.71 F1 score which represents a 31.5% improvement over the previously best-performing support vector machine model. 
    more » « less