skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxochitl Mixtec
“Transcription bottlenecks”, created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxochitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.  more » « less
Award ID(s):
1761421
PAR ID:
10281121
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2021 Conference of the European Chapter of the Association for Computational Linguistics, 21–23 April. https://arxiv.org/abs/2101.10877
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper describes three open access Yoloxóchitl Mixtec corpora and presents the results and implications of end-to-end automatic speech recognition for endangered language documentation. Two issues are addressed. First, the advantage for ASR accuracy of targeting informational (BPE) units in addition to, or in substitution of, linguistic units (word, morpheme, morae) and then using ROVER for system combination. BPE units consistently outperform linguistic units although the best results are obtained by system combination of different BPE targets. Second, a case is made that for endangered language documentation, ASR contributions should be evaluated according to extrinsic criteria (e.g., positive impact on downstream tasks) and not simply intrinsic metrics (e.g., CER and WER). The extrinsic metric chosen is the level of reduction in the human effort needed to produce high-quality transcriptions for permanent archiving. 
    more » « less
  2. New advances in machine learning have made Automated Speech Recognition (ASR) systems practical and more scalable. These systems, however, pose serious privacy threats as speech is a rich source of sensitive acoustic and textual information. Although offline and open-source ASR eliminates the privacy risks, its transcription performance is inferior to that of cloud-based ASR systems, especially for real-world use cases. In this paper, we propose Prεεch, an end-to-end speech transcription system which lies at an intermediate point in the privacy-utility spectrum. It protects the acoustic features of the speakers’ voices and protects the privacy of the textual content at an improved performance relative to offline ASR. Additionally, Prεεch provides several control knobs to allow customizable utility-usability-privacy trade-off. It relies on cloud-based services to transcribe a speech file after applying a series of privacy-preserving operations on the user’s side. We perform a comprehensive evaluation of Prεεch, using diverse real-world datasets, that demonstrates its effectiveness. Prεεch provides transcription at a 2% to 32.25% (mean 17.34%) relative improvement in word error rate over Deep Speech, while fully obfuscating the speakers' voice biometrics and allowing only a differentially private view of the textual content. 
    more » « less
  3. Self-supervised learning representations (SSLR) have resulted in robust features for downstream tasks in many fields. Recently, several SSLRs have shown promising results on automatic speech recognition (ASR) benchmark corpora. However, previous studies have only shown performance for solitary SSLRs as an input feature for ASR models. In this study, we propose to investigate the effectiveness of diverse SSLR combinations using various fusion methods within end-to-end (E2E) ASR models. In addition, we will show there are correlations between these extracted SSLRs. As such, we further propose a feature refinement loss for decorrelation to efficiently combine the set of input features. For evaluation, we show that the proposed “FeaRLESS learning features” perform better than systems without the proposed feature refinement loss for both the WSJ and Fearless Steps Challenge (FSC) corpora. 
    more » « less
  4. To assist in the documentation of Čakavian, an endangered language variety closely related to Croatian, we test four currently available ASR models that are trained with Croatian data and assess their performance in the transcription of Čakavian audio data. We compare the models’ word error rates, analyze the word-level error types, and showcase the most frequent Deletion and Substitution errors. The evaluation results indicate that the best-performing system for transcribing Čakavian was a CTC-based variant of the Conformer model. 
    more » « less
  5. This paper studies the alkali-silica reaction (ASR) in rapid-strength belitic calcium sulfoaluminate (BCSA) cement systems. Theoretically, its low alkalinity and high alumina content should make BCSA less prone to ASR than portland cement (PC), but little experimental evidence has been published, and the theorized mechanisms have not been examined critically. We examine this problem using expansion tests, microstructural analysis, and pore solution analysis. Accelerated expansion tests show increased expansion in BCSA mortars with reactive aggregates, but we argue that the test conditions are unsuitable for the cement. Long-term expansion tests show a significant reduction in expansion in BCSA mortars with reactive aggregates, but later-age measurements still exceed ASTM C1778 limits and microstructural investigations indicate ASR damage. Curiously, BCSA mortars with nonreactive aggregates also expanded significantly, but no ASR damage was observed. BCSA pore solutions had ten times more aluminum than PC and one-tenth as much calcium. While the pH was sufficiently high to initiate ASR, the alkali reserves can be half or less than in PC. Overall, BCSA cement is not immune to ASR, but it is more resistant than PC. This is mostly related to the lower alkalinity of the cement and, to a lesser degree, to the abundance of alumina and shortage of soluble calcium. 
    more » « less