Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context

Ginn, Michael; Palmer, Alexis

doi:10.18653/v1/2023.genbench-1.7

Citation Details

Robust Generalization Strategies for Morpheme Glossing in an Endangered Language Documentation Context

Generalization is of particular importance in resource-constrained settings, where the available training data may represent only a small fraction of the distribution of possible texts. We investigate the ability of morpheme labeling models to generalize by evaluating their performance on unseen genres of text, and we experiment with strategies for closing the gap between performance on in-distribution and out-of-distribution data. Specifically, we use weight decay optimization, output denoising, and iterative pseudo-labeling, and achieve a 2% improvement on a test set containing texts from unseen genres. All experiments are performed using texts written in the Mayan language Uspanteko. more »

Award ID(s):: 2149404

PAR ID:: 10539621

Author(s) / Creator(s):: Ginn, Michael; Palmer, Alexis

Publisher / Repository:: Association for Computational Linguistics

Date Published:: 2023-12-01

Page Range / eLocation ID:: 89 to 98

Format(s):: Medium: X

Location:: Singapore

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2023.genbench-1.7

More Like this