Building a Broad Infrastructure for Uniform Meaning Representations

Bonn, Juli; Buchholz, Matthew J; Chun, Jayeol; Cowell, Andrew; Croft, William; Denk, Lukas; Ge, Sijia; Hajič, Jan; Lai, Kenneth; Martin, James H; Myers, Skatje; Palmer, Alexis; Palmer, Martha; Post, Claire Benet; Pustejovsky, James; Stenzel, Kristine; Sun, Haibo; Urešová, Zdeňka; Vallejos, Rosa; Van_Gysel, Jens_E_L; Vigus, Meagan; Xue, Nianwen; Zhao, Jin

This paper reports the first release of the UMR (Uniform Meaning Representation) data set. UMR is a graph-based meaning representation formalism consisting of a sentence-level graph and a document-level graph. The sentence-level graph represents predicate-argument structures, named entities, word senses, aspectuality of events, as well as person and number information for entities. The document-level graph represents coreferential, temporal, and modal relations that go beyond sentence boundaries. UMR is designed to capture the commonalities and variations across languages and this is done through the use of a common set of abstract concepts, relations, and attributes as well as concrete concepts derived from words from invidual languages. This UMR release includes annotations for six languages (Arapaho, Chinese, English, Kukama, Navajo, Sanapana) that vary greatly in terms of their linguistic properties and resource availability. We also describe on-going efforts to enlarge this data set and extend it to other genres and modalities. We also briefly describe the available infrastructure (UMR annotation guidelines and tools) that others can use to create similar data sets.

More Like this