Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik

Park, Hyunji; Schwartz, Lane; Tyers, Francis

doi:10.18653/v1/2021.americasnlp-1.14

Citation Details

Expanding Universal Dependencies for Polysynthetic Languages: A Case of St. Lawrence Island Yupik

This paper describes the development of the first Universal Dependencies (UD) treebank for St. Lawrence Island Yupik, an endangered language spoken in the Bering Strait region. While the UD guidelines provided a general framework for our annotations, language-specific decisions were made necessary by the rich morphology of the polysynthetic language. Most notably, we annotated a corpus at the morpheme level as well as the word level. The morpheme level annotation was conducted using an existing morphological analyzer and manual disambiguation. By comparing the two resulting annotation schemes, we argue that morpheme-level annotation is essential for polysynthetic languages like St. Lawrence Island Yupik. Word-level annotation results in degenerate trees for some Yupik sentences and often fails to capture syntactic relations that can be manifested at the morpheme level. Dependency parsing experiments provide further support for morpheme-level annotation. Implications for UD annotation of other polysynthetic languages are discussed. more »

Award ID(s):: 1761680 2243445

PAR ID:: 10285561

Author(s) / Creator(s):: Park, Hyunji; Schwartz, Lane; Tyers, Francis

Date Published:: 2021-06-01

Journal Name:: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas

Page Range / eLocation ID:: 131-142

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2021.americasnlp-1.14

More Like this