In this paper, we challenge the ACL community to reckon with historical and ongoing colonialism by adopting a set of ethical obligations and best practices drawn from the Indigenous studies literature. While the vast majority of NLP research focuses on a very small number of very high resource languages (English, Chinese, etc), some work has begun to engage with Indigenous languages. No research involving Indigenous language data can be considered ethical without first acknowledging that Indigenous languages are not merely very low resource languages. The toxic legacy of colonialism permeates every aspect of interaction between Indigenous communities and outside researchers. To this end, we propose that the ACL draft and adopt an ethical framework for NLP researchers and computational linguists wishing to engage in research involving Indigenous languages.
more »
« less
Application of Speech Processes for the Documentation of Kréyòl Gwadloupéyen
In recent times, there has been a growing number of research studies focused on addressing the challenges posed by low-resource languages and the transcription bottleneck phenomenon. This phenomenon has driven the development of speech recognition methods to transcribe regional and Indigenous languages automatically. Although there is much talk about bridging the gap between speech technologies and field linguistics, there is a lack of documented efficient communication between NLP experts and documentary linguists. The models created for low-resource languages often remain within the confines of computer science departments, while documentary linguistics remain attached to traditional transcription workflows. This paper presents the early stage of a collaboration between NLP experts and field linguists, resulting in the successful transcription of Kréyòl Gwadloupéyen using speech recognition technology.
more »
« less
- Award ID(s):
- 1952568
- PAR ID:
- 10476856
- Editor(s):
- Serikov, O.; Voloshina, E; Postnikova, A.; Klyachko, E.; Neminova E.; Vylomova, E.; Shavrina, T.; Le Ferrand, E.; Tyers, F
- Publisher / Repository:
- Association for Computational Linguistics
- Date Published:
- Journal Name:
- Proceedings of the Second Workshop on NLP Applications to Field Linguistics
- Edition / Version:
- 1
- Page Range / eLocation ID:
- 17-22
- Format(s):
- Medium: X Other: pdf
- Location:
- https://aclanthology.org/2023.fieldmatters-1.2/
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Linguists are seldom, if ever, engaged in work aimed at communicating risk to the general public. The COVID-19 global pandemic and its associated infodemic may change this state of affairs, at least for documentary linguists. Documenting languages may bring researchers in direct contact with communities speaking minority or marginalized languages and gain key insights into their communicative ecologies. By being both immersed in local networks and more or less knowledgeable about the community’s communicative habits, documentary linguists appear to be placed in a unique position to contribute to communicating risk in ways that are better tailored to the community and, therefore, potentially quite effective locally. Furthermore, adding work in risk communication to their agenda may also stimulate documentary linguists to find new models for “giving back” to the communities they work with. In order to provide a concrete example of how all this may play out in concrete terms, we illustrate the virALLanguages project.more » « less
-
Research in programming languages and software engineering are broadly concerned with the study of aspects of computer programs: their syntactic structure, the relationship between form and meaning (semantics), empirical properties of how they are constructed and deployed, and more. We could equally well apply this description to the range of ways in which linguistics studies the form, meaning, and use of natural language. We argue that despite some notable examples of PL and SE research drawing on ideas from natural language processing, there are still a wealth of concepts, techniques, and conceptual framings originating in linguistics which would be of use to PL and SE research. Moreover we show that beyond mere parallels, there are cases where linguistics research has complementary methodologies, may help explain or predict study outcomes, or offer new perspectives on established research areas in PL and SE. Broadly, we argue that researchers across PL and SE are investigating close cousins of problems actively studied for years by linguists, and familiarity with linguistics research seems likely to bear fruit for many PL and SE researchers.more » « less
-
Polysynthetic languages present a challenge for morphological analysis due to the complexity of their words and the lack of high-quality annotated datasets needed to build and/or evaluate computational models. The contribution of this work is twofold. First, using linguists’ help, we generate and contribute high-quality annotated data for two low-resource polysynthetic languages for two tasks: morphological segmentation and part-of-speech (POS) tagging. Second, we present the results of state-of-the-art unsupervised approaches for these two tasks on Adyghe and Inuktitut. Our findings show that for these polysynthetic languages, using linguistic priors helps the task of morphological segmentation and that using stems rather than words as the core unit of abstraction leads to superior performance on POS tagging.more » « less
-
Polysynthetic languages present a challenge for morphological analysis due to the complexity of their words and the lack of high-quality annotated datasets needed to build and/or evaluate computational models. The contribution of this work is twofold. First, using linguists’ help, we generate and contribute high-quality annotated data for two low-resource polysynthetic languages for two tasks: morphological segmentation and part-of-speech (POS) tagging. Second, we present the results of state-of-the-art unsupervised approaches for these two tasks on Adyghe and Inuktitut. Our findings show that for these polysynthetic languages, using linguistic priors helps the task of morphological segmentation and that using stems rather than words as the core unit of abstraction leads to superior performance on POS tagging.more » « less
An official website of the United States government

