Investigating Speaker Diarization of Endangered Language Data

Levow, Gina-Anne

Citation Details

The task of speaker diarization aims to determine which speakers spoke when in a recording. Such functionality could help to accelerate work in endangered languages by facilitating transcription and semi-automatically extracting useful meta-data to enrich language archives. However, there has been little work on speaker diarization for low-resource or endangered languages. This work explores three neural approaches to speaker diarization applied to data sets drawn from endangered language archives. We find consistent improvements for recent neural x-vector models over earlier approaches. We also assess the factors which impact performance across models and data sets, with a focus on the challenging characteristics of endangered language recordings. more »

Award ID(s):: 1760475

PAR ID:: 10425540

Author(s) / Creator(s):: Levow, Gina-Anne

Date Published:: 2023-03-06

Journal Name:: Proceedings of the Sixth Workshop on the Use of Computational Methods in the Study of Endangered Languages

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this