skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Stenzel, Kristine"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani ; Xue, Nianwen (Ed.)
    This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets. 
    more » « less
    Free, publicly-accessible full text available May 20, 2025
  2. Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani ; Xue, Nianwen (Ed.)
    This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets. 
    more » « less
    Free, publicly-accessible full text available May 1, 2025
  3. Language documentation encompasses translation, typically into the dominant high-resource language in the region where the target language is spoken. To make data accessible to a broader audience, additional translation into other high-resource languages might be needed. Working within a project documenting Kotiria, we explore the extent to which state-of-the-art machine translation (MT) systems can support this second translation – in our case from Portuguese to English. This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system’s training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages. Despite these challenges, existing MT systems perform at a usable level, though there is still room for improvement. We then conduct a qualitative analysis and suggest ways to improve MT between high-resource languages in a language documentation setting. 
    more » « less
  4. This article analyzes the use of several response particles in face-to-face interaction in Wa’ikhana, an East Tukano language of northwestern Amazonia. Adopting a Conversation Analysis approach, we explore details of each particle, considering their prosodic shapes, the action contexts in which they occur, and their sequential positioning, all crucial to understanding their meanings in interaction. Our analysis shows that Wa’ikhana response particles exhibit both universal and language-particular properties, thus demonstrating the contributions of data from lesser-studied languages to research on language in social interaction, and the value of an interactional approach in the study of under-described, and often endangered, indigenous languages. 
    more » « less
  5. This narrative tells of an episode in the personal life of Tomás Nogueira, a speaker of Wa’ikhana (or Piratapuyo, East Tukano family), a highly endangered and still little described language. The Wa’ikhana people live in northwest Amazonia, in villages in the Alto Rio Negro Indigenous Lands (Brazil) and neighboring Departamento de Vaupés (Colombia). Tomás’s tale presents us a good-humored account of a day when his hunting plans went wrong. Besides full interlinear analysis of the narrative, we off er background information about the Wa’ikhana people and a brief typological profi le of the language, highlighting salient grammatical structures that can be observed throughout the narrative. 
    more » « less