skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Vigus, Meagan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani ; Xue, Nianwen (Ed.)
    This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets. 
    more » « less
    Free, publicly-accessible full text available May 20, 2025
  2. Calzolari, Nicoletta ; Kan, Min-Yen ; Hoste, Veronique ; Lenci, Alessandro ; Sakti, Sakriani ; Xue, Nianwen (Ed.)
    This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets. 
    more » « less
    Free, publicly-accessible full text available May 1, 2025
  3. This paper presents detailed mappings between the structures used in Abstract Meaning Representation (AMR) and those used in Uniform Meaning Representation (UMR). These structures include general semantic roles, rolesets, and concepts that are largely shared between AMR and UMR, but with crucial differences. While UMR annotation of new low-resource languages is ongoing, AMR-annotated corpora already exist for many languages, and these AMR corpora are ripe for conversion to UMR format. Rather than focusing on semantic coverage that is new to UMR (which will likely need to be dealt with manually), this paper serves as a resource (with illustrated mappings) for users looking to understand the fine-grained adjustments that have been made to the representation techniques for semantic categories present in both AMR and UMR. 
    more » « less
  4. Claire Bonial, Nianwen Xue (Ed.)
    We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels. 
    more » « less
  5. null (Ed.)
    In this paper we present Uniform Meaning Representation (UMR), a meaning representation designed to annotate the semantic content of a text. UMR is primarily based on Abstract Meaning Representation (AMR), an annotation framework initially designed for English, but also draws from other meaning representations. UMR extends AMR to other languages, particularly morphologically complex, low-resource languages. UMR also adds features to AMR that are critical to semantic interpretation and enhances AMR by proposing a companion document-level representation that captures linguistic phenomena such as coreference as well as temporal and modal dependencies that potentially go beyond sentence boundaries. 
    more » « less
  6. This paper presents a “road map” for the annotation of semantic categories in typologically diverse languages, with potentially few linguistic resources, and often no existing computational resources. Past semantic annotation efforts have focused largely on high-resource languages, or relatively low-resource languages with a large number of native speakers. However, there are certain typological traits, namely the synthesis of multiple concepts into a single word, that are more common in languages with a smaller speech community. For example, what is expressed as a sentence in a more analytic language like English, may be expressed as a single word in a more synthetic language like Arapaho. This paper proposes solutions for annotating analytic and synthetic languages in a comparable way based on existing typological research, and introduces a road map for the annotation of languages with a dearth of resources. 
    more » « less
  7. Xue, Nianwen ; Croft, William ; Hajic, Jan ; Huang, Chu-Ren ; Oepen, Stephan ; Palmer, Martha ; Pustejovsky, James (Ed.)
    This paper presents an annotation scheme for modality that employs a dependency structure. Events and sources (here, conceivers) are represented as nodes and epistemic strength relations characterize the edges. The epistemic strength values are largely based on Saurí and Pustejovsky’s (2009) FactBank, while the dependency structure mirrors Zhang and Xue’s (2018b) approach to temporal relations. Six documents containing 377 events have been annotated by two expert annotators with high levels of agreement. 
    more » « less
  8. Xue, Nianwen ; Croft, William ; Hajic, Jan ; Huang, Chu-Ren ; Oepen, Stephan ; Palmer, Martha ; Pustejovsky, James (Ed.)
    Developers of cross-lingual semantic annotation schemes face a number of issues not encountered in monolingual annotation. This paper discusses four such issues, related to the establishment of annotation labels, and the treatment of languages with more fine-grained, more coarse-grained, and cross-cutting categories. We propose that a lattice-like architecture of the annotation categories can adequately handle all four issues, and at the same time remain both intuitive for annotators and faithful to typological insights. This position is supported by a brief annotation experiment. 
    more » « less