skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 20, 2026

Title: Bootstrapping UMRs from Universal Dependencies for Scalable Multilingual Annotation
Uniform Meaning Representation (UMR) is a semantic annotation framework designed to be applicable across typologically diverse languages. However, UMR annotation is a labor-intensive task, requiring significant effort and time especially when no prior annotations are available. In this paper, we present a method for bootstrapping UMR graphs by leveraging Universal Dependencies (UD), one of the most comprehensive multilingual resources, encompassing languages across a wide range of language families. Given UMR’s strong typological and cross-linguistic orientation, UD serves as a particularly suitable starting point for the conversion. We describe and evaluate an approach that automatically derives partial UMR graphs from UD trees, providing annotators with an initial representation to build upon. While UD is not a semantic resource, our method extracts useful structural information that aligns with the UMR formalism, thereby facilitating the annotation process. By leveraging UD’s broad typological coverage, this approach offers a scalable way to support UMR annotation across different languages.  more » « less
Award ID(s):
2213805
PAR ID:
10599366
Author(s) / Creator(s):
; ;
Publisher / Repository:
Proceedings of the 19th Linguistic Annotation Workshop, Association for Computational Linguistics
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In this paper we present Uniform Meaning Representation (UMR), a meaning representation designed to annotate the semantic content of a text. UMR is primarily based on Abstract Meaning Representation (AMR), an annotation framework initially designed for English, but also draws from other meaning representations. UMR extends AMR to other languages, particularly morphologically complex, low-resource languages. UMR also adds features to AMR that are critical to semantic interpretation and enhances AMR by proposing a companion document-level representation that captures linguistic phenomena such as coreference as well as temporal and modal dependencies that potentially go beyond sentence boundaries. 
    more » « less
  2. UMR-Writer is a web-based tool for annotating semantic graphs with the Uniform Meaning Representation (UMR) scheme. UMR is a graph-based semantic representation that can be applied cross-linguistically for deep semantic analysis of texts. In this work, we implemented a new keyboard interface in UMR-Writer 2.0, which is a powerful addition to the original mouse interface, supporting faster annotation for more experienced annotators. The new interface also addresses issues with the original mouse interface. Additionally, we demonstrate an efficient workflow for annotation project management in UMR-Writer 2.0, which has been applied to many projects. 
    more » « less
  3. This paper presents detailed mappings between the structures used in Abstract Meaning Representation (AMR) and those used in Uniform Meaning Representation (UMR). These structures include general semantic roles, rolesets, and concepts that are largely shared between AMR and UMR, but with crucial differences. While UMR annotation of new low-resource languages is ongoing, AMR-annotated corpora already exist for many languages, and these AMR corpora are ripe for conversion to UMR format. Rather than focusing on semantic coverage that is new to UMR (which will likely need to be dealt with manually), this paper serves as a resource (with illustrated mappings) for users looking to understand the fine-grained adjustments that have been made to the representation techniques for semantic categories present in both AMR and UMR. 
    more » « less
  4. Uniform Meaning Representation (UMR) is the next phase of semantic formalism following Abstract Meaning Representation (AMR), with added focus on inter-sentential relations allowing the representational scope of UMR to cover a full document. This, in turn, greatly increases the complexity of its parsing task with the additional requirement of capturing document-level linguistic phenomena such as coreference, modal and temporal dependencies. In order to establish a strong baseline despite the small size of recently released UMR v1.0 corpus, we introduce a pipeline model that does not require any training. At the core of our method is a two-track strategy of obtaining UMR’s sentence and document graphs separately, with the document-level triples being compiled at the token level and the sentence graph being converted from AMR graphs. By leveraging alignment between AMR and its sentence, we are able to generate the first automatic English UMR parses. 
    more » « less
  5. Calzolari, Nicoletta; Kan, Min-Yen; Hoste, Veronique; Lenci, Alessandro; Sakti, Sakriani; Xue, Nianwen (Ed.)
    Uniform Meaning Representation (UMR) is a semantic labeling system in the AMR family designed to be uniformly applicable to typologically diverse languages. The UMR labeling system is quite thorough and can be time-consuming to execute, especially if annotators are starting from scratch. In this paper, we focus on methods for bootstrapping UMR annotations for a given language from existing resources, and specifically from typical products of language documentation work, such as lexical databases and interlinear glossed text (IGT). Using Arapaho as our test case, we present and evaluate a bootstrapping process that automatically generates UMR subgraphs from IGT. Additionally, we describe and evaluate a method for bootstrapping valency lexicon entries from lexical databases for both the target language and English. We are able to generate enough basic structure in UMR graphs from the existing Arapaho interlinearized texts to automate UMR labeling to a significant extent. Our method thus has the potential to streamline the process of building meaning representations for new languages without existing large-scale computational resources. 
    more » « less