skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improved Finite-State Morphological Analysis for St. Lawrence Island Yupik Using Paradigm Function Morphology
St. Lawrence Island Yupik is an endangered polysynthetic language of the Bering Strait region. While conducting linguistic fieldwork between 2016 and 2019, we observed substantial support within the Yupik community for language revitalization and for resource development to support Yupik education. To that end, Chen & Schwartz (2018) implemented a finite-state morphological analyzer as a critical enabling technology for use in Yupik language education and technology. Chen & Schwartz (2018) reported a morphological analysis coverage rate of approximately 75% on a dataset of 60K Yupik tokens, leaving considerable room for improvement. In this work, we present a re-implementation of the Chen & Schwartz (2018) finite-state morphological analyzer for St. Lawrence Island Yupik that incorporates new linguistic insights; in particular, in this implementation we make use of the Paradigm Function Morphology (PFM) theory of morphology. We evaluate this new PFM-based morphological analyzer, and demonstrate that it consistently outperforms the existing analyzer of Chen & Schwartz (2018) with respect to accuracy and coverage rate across multiple datasets.  more » « less
Award ID(s):
1761680 2243445
PAR ID:
10184458
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
LREC proceedings
ISSN:
2522-2686
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. St. Lawrence Island / Central Siberian Yupik is an endangered language, indigenous to St. Lawrence Island in Alaska and the Chukotka Peninsula of Russia, that exhibits pervasive agglutinative and polysynthetic properties. This paper discusses an implementation of a finite-state morphological analyzer for Yupik that was developed in accordance with the grammatical standards and phenomena documented in Steven A. Jacobson’s 2001 reference grammar for Yupik. The analyzer was written in foma, an open source framework for constructing finite-state grammars of morphology. The approach presented here cyclically interweaves morphology and phonology to account for the language’s intricate morphophonological system, an approach that may be applicable to languages of matching typology. The morphological analyzer has been designed to serve as foundational resource that will eventually underpin a suite of computational tools for Yupik to assist in the process of linguistic documentation and revitalization. 
    more » « less
  2. Morphological analysis is a critical enabling technology for polysynthetic languages. We present a neural morphological analyzer for case-inflected nouns in St. Lawrence Island Yupik, an endangered polysynthetic language in the Inuit-Yupik language family, treating morphological analysis as a recurrent neural sequence-to-sequence task. By utilizing an existing finite-state morphological analyzer to create training data, we improve analysis coverage on attested Yupik word types from approximately 75% for the existing finite-state analyzer to 100% for the neural analyzer. At the same time, we achieve a substantially higher level of accuracy on a held-out testing set, from 78.9% accuracy for the finite-state analyzer to 92.2% accuracy for our neural analyzer. 
    more » « less
  3. Morphological analysis is a critical enabling technology for polysynthetic languages. We present a neural morphological analyzer for case-inflected nouns in St. Lawrence Island Yupik, an endangered polysynthetic language in the Inuit-Yupik language family, treating morphological analysis as a recurrent neural sequence-to-sequence task. By utilizing an existing finite-state morphological analyzer to create training data, we improve analysis coverage on attested Yupik word types from approximately 75% for the existing finite-state analyzer to 100% for the neural analyzer. At the same time, we achieve a substantially higher level of accuracy on a held-out testing set, from 78.9% accuracy for the finite-state analyzer to 92.2% accuracy for our neural analyzer. 
    more » « less
  4. Akuzipik (Yupigestun/Yupik/St. Lawrence Island Yupik/Siberian Yupik/Chaplinski Yupik) is an endangered language belonging to the Yupik branch of the Inuit-Yupik-Unangan language family. It is currently spoken by 800-900 people in the Bering Strait region, mainly on St. Lawrence Island, Alaska (St. Lawrence Island Yupik), and on the coast of the Chukotka Peninsula, in Russia (Chaplinski Yupik) (de Reuse 1994; Schwartz et al. 2019). The linguistic differences between these two varieties seem to be minor and not affect mutual intelligibility (Krauss 1975). The language has been undergoing a rapid generational shift, beginning in the 1950s in Russia and in the 1990s in Alaska (Schwartz et al. 2019). 
    more » « less
  5. Akuzipik (Yupigestun/Yupik/St. Lawrence Island Yupik/Siberian Yupik/Chaplinski Yupik) is an endangered language belonging to the Yupik branch of the Inuit-Yupik-Unangan language family. It is currently spoken by 800-900 people in the Bering Strait region, mainly on St. Lawrence Island, Alaska (St. Lawrence Island Yupik), and on the coast of the Chukotka Peninsula, in Russia (Chaplinski Yupik) (de Reuse 1994; Schwartz et al. 2019). The linguistic differences between these two varieties seem to be minor and not affect mutual intelligibility (Krauss 1975). The language has been undergoing a rapid generational shift, beginning in the 1950s in Russia and in the 1990s in Alaska (Schwartz et al. 2019). 
    more » « less