%AMu, Yang%AHirschi, Kevin%ALooney, Stephen%AKang, Okim%AHansen, John%D2022%I
%K
%MOSTI ID: 10358184
%PMedium: X
%TImproving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
%XCurrent leading mispronunciation detection and diagnosis
(MDD) systems achieve promising performance via end-to-end
phoneme recognition. One challenge of such end-to-end solutions
is the scarcity of human-annotated phonemes on natural
L2 speech. In this work, we leverage unlabeled L2 speech via
a pseudo-labeling (PL) procedure and extend the fine-tuning
approach based on pre-trained self-supervised learning (SSL)
models. Specifically, we use Wav2vec 2.0 as our SSL model,
and fine-tune it using original labeled L2 speech samples plus
the created pseudo-labeled L2 speech samples. Our pseudo labels
are dynamic and are produced by an ensemble of the online
model on-the-fly, which ensures that our model is robust to
pseudo label noise. We show that fine-tuning with pseudo labels
achieves a 5.35% phoneme error rate reduction and 2.48%
MDD F1 score improvement over a labeled-samples-only finetuning
baseline. The proposed PL method is also shown to
outperform conventional offline PL methods. Compared to the
state-of-the-art MDD systems, our MDD solution produces a
more accurate and consistent phonetic error diagnosis. In addition,
we conduct an open test on a separate UTD-4Accents
dataset, where our system recognition outputs show a strong
correlation with human perception, based on accentedness and
intelligibility.
Country unknown/Code not availableJournal ID: 1990-9772OSTI-MSA