skip to main content


Title: Teaching about endangered languages in the undergraduate curriculum: Teaching about endangered languages
NSF-PAR ID:
10059706
Author(s) / Creator(s):
 
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Language and Linguistics Compass
Volume:
12
Issue:
7
ISSN:
1749-818X
Page Range / eLocation ID:
e12283
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper describes on-going work in dictionary digitization, and in particular the processing of OCRed text into a structured lexicon. In processing the output of an OCR engine into structured data, we are faced with three problems: 1. Typographic errors; 2. Conversion from the dictionary’s visual layout into a lexicographically structured computer-readable format, such as XML; and 3. Converting each dictionary’s idiosyncratic structure into some standard tagging system. This paper deals mostly with the second issue, but touches on the first and third. 
    more » « less
  2. The application of deep learning to automatic speech recognition (ASR) has yielded dramatic accuracy increases for languages with abundant training data, but languages with limited training resources have yet to see accuracy improvements on this scale. In this paper, we compare a fully convolutional approach for acoustic modelling in ASR with a variety of established acoustic modeling approaches. We evaluate our method on Seneca, a low-resource endangered language spoken in North America. Our method yields word error rates up to 40% lower than those reported using both standard GMM-HMM approaches and established deep neural methods, with a substantial reduction in training time. These results show particular promise for languages like Seneca that are both endangered and lack extensive documentation. 
    more » « less
  3. null (Ed.)
    Advances in speech and language processing have enabled the creation of applications that could, in principle, accelerate the process of language documentation, as speech communities and linguists work on urgent language documentation and reclamationnprojects. However, such systems have yet to make a significant impact on language documentation, as resource requirements limit the broad applicability of these new techniques. We aim to exploit the framework of shared tasks to focus the technology research community on tasks which address key pain points in language documentation. Here we presentninitial steps in the implementation of these new shared tasks, through the creation of data sets drawn from endangered language repositories and baseline systems to perform segmentation and speaker labeling of these audio recordings—important enabling steps in the documentation process. This paper motivates these tasks with a use case, describes data set curation and baseline systems, and presents results on this data. We then highlight the challenges and ethical considerations in developing these speech processing tools and tasks to support endangered language documentation. 
    more » « less
  4. Abstract We challenge the idea that a course intended to convey principles of languages should be structured according to those principles, and present an alternate approach to teaching a programming language course. The approach involves teaching emerging programming languages. This approach results in a variety of course desiderata including scope for instructor customization; alignment with current trends in language evolution, practice, and research; and congruence with industrial needs. We discuss the rationale for, the course mechanics supporting, and the consequences of this approach. 
    more » « less