This paper problematizes the assessment of speakers’ proficiency in endangered language communities. We focus in particular on processes of lexical production and elicitation as proxies for full proficiency assessment. Among linguists, it is standard to assess a speaker’s knowledge of specific lexical items in order to set a baseline for further data collection and research. Yet, as we argue in this paper, such tests can give the false impression that speakers do not know their language, since such tests do not distinguish between what speakers can recall in a particular moment and what they do not know because they did not acquire it. The endangered language context in particular calls for a more fine-tuned interpretation of lexical knowledge, given the high degree of idiolectal variation and lack of a community-based standard language. Drawing on fieldwork with Chukchi and Even Indigenous communities in northeastern Russia, we analyze lexical items that speakers claim to not remember. We then distinguish different reasons that are given for not remembering and consider their implications for speakers’ proficiency. Finally, we conclude with two recommendations for improving elicitation and language assessment tests.
more »
« less
BELT: Building Endangered Language Technology
The development of language technology (LT) for an endangered language is often identified as a goal in language revitalization efforts, but developing such technologies is typically subject to additional methodological challenges as well as social and ethical concerns. In particular, LT development has too often taken on colonialist qualities, extracting language data, relying on outside experts, and denying the speakers of a language sovereignty over the technologies produced.We seek to avoid such an approach through the development of the Building Endangered Language Technology (BELT) website, an educational resource designed for speakers and community members with limited technological experience to develop LTs for their own language. Specifically, BELT provides interactive lessons on basic Python programming, coupled with projects to develop specific language technologies, such as spellcheckers or word games. In this paper, we describe BELT’s design, the motivation underlying many key decisions, and preliminary responses from learners.
more »
« less
- Award ID(s):
- 2149404
- PAR ID:
- 10651190
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Over the past three decades the field of linguistics has refocused attention on endangered languages, and enormous strides have been made to document these languages and develop archive infrastructure for language data. Although the potential for language archives to support language renewal efforts has often been tacitly assumed, much greater attention has been given to the preservation of data than to access and utilization. Documentation activities are imagined as a race against time to get language data into a lasting form before the last speakers pass away. Here I describe three examples of efforts which are working to engage with language communities and increase the accessibility and usability of language resources. Though not necessarily representative, these efforts suggest ways in which linguists, archivists, and communities can collaborate to support digital return.more » « less
-
Abstract A large percentage of the world’s languages – anywhere from 50 to 90% – are currently spoken in what we call shift ecologies, situations of unstable bi- or multilingualism where speakers, and in particular younger speakers, do not use their ancestral language but rather speak the majority language. The present paper addresses several interrelated questions with regard to the linguistic effects of bilingualism in such shift ecologies. These language ecologies are dynamic: language choices and preferences change, as do speakers’ proficiency levels. One result is multiple kinds of variation in these endangered language communities. Understanding change and shift requires a methodology for establishing a baseline; descriptive grammars rarely provide information about usage and multilingual language practices. An additional confounder is a range of linguistic variation: regional (dialectal); generational (language-internal change without contact or shift); contact-based (contact with or without shift); and proficiency-based (variation which develops as a result of differing levels of input and usage). Widespread, ongoing language shift today provides opportunities to examine the linguistic changes exhibited by shifting speakers, that is, to zero in on language change and loss in process, rather than as an end product.more » « less
-
The task of speaker diarization aims to determine which speakers spoke when in a recording. Such functionality could help to accelerate work in endangered languages by facilitating transcription and semi-automatically extracting useful meta-data to enrich language archives. However, there has been little work on speaker diarization for low-resource or endangered languages. This work explores three neural approaches to speaker diarization applied to data sets drawn from endangered language archives. We find consistent improvements for recent neural x-vector models over earlier approaches. We also assess the factors which impact performance across models and data sets, with a focus on the challenging characteristics of endangered language recordings.more » « less
-
Research questions:This study asks whether an interface phenomenon such as noun incorporation (NI) displays meaningful socially conditioned variation in the endangered polysynthetic language, Chukchi, by investigating whether speakers of all levels of experience or proficiency make use of NI in a consistent, rule-governed way. Design and methodology:This study compares production data from small groups of speakers of a moribund language. Study tasks include a controlled production task in which speakers are asked to construct sentences using provided lexical items. The lexical items were conditioned so as to trigger NI in certain stimuli (on the basis of verbal valency and argument animacy). Data and analysis:The production data was transcribed and coded for the occurrence and structural type of NI (compounding vs. syntactic incorporation). The results were compared across three groups of speakers: conservative older speakers, younger attriting speakers, and new speakers. Findings/conclusions:NI frequency and productivity clearly differ among the three groups. CSs use incorporation frequently and productively in the expected contexts, while ASs use productive incorporation only in familiar contexts, followed by NSs who make little to no use of incorporation. All speaker groups display knowledge of the appropriate circumstances in which to use incorporation. Originality:This study makes use of a novel experimental methodology in studying several under-researched areas: variation in traditional Chukchi, shift-induced variation in a polysynthetic language, and NI as a locus of variation. Significance/implications:This study contributes to our understanding of the behavior of non-normative speakers of endangered languages and demonstrates that they play a role in language preservation. The study shows that the diffuse nature of the Chukchi speech community is different from comparatively well-studied shift settings (especially in the North American and European contexts) in its lack of a community of use or practice, which presents unique challenges in language maintenance.more » « less
An official website of the United States government

