End-To-End Real Time Tracking of Children’s Reading with Pointer Network

Sunder, Vishal; Karrolla, Beulah; Fosler-Lussier, Eric

doi:10.1109/ICASSP48485.2024.10446486

Citation Details

End-To-End Real Time Tracking of Children’s Reading with Pointer Network

In this work, we explore how a real time reading tracker can be built efficiently for children’s voices. While previously proposed reading trackers focused on ASR-based cascaded approaches, we propose a fully end-to-end model making it less prone to lags in voice tracking. We employ a pointer network that directly learns to predict positions in the ground truth text conditioned on the streaming speech. To train this pointer network, we generate ground truth training signals by using forced alignment between the read speech and the text being read on the training set. Exploring different forced alignment models, we find a neural attention based model is at least as close in alignment accuracy to the Montreal Forced Aligner, but surprisingly is a better training signal for the pointer network. Our results are reported on one adult speech data (TIMIT) and two children’s speech datasets (CMU Kids and Reading Races). Our best model can accurately track adult speech with 87.8% accuracy and the much harder and disfluent children’s speech with 77.1% accuracy on CMU Kids data and a 65.3% accuracy on the Reading Races dataset. more »

Award ID(s):: 2008043

PAR ID:: 10560473

Author(s) / Creator(s):: Sunder, Vishal; Karrolla, Beulah; Fosler-Lussier, Eric

Publisher / Repository:: IEEE

Date Published:: 2024-04-14

ISSN:: 2379-190X

ISBN:: 979-8-3503-4485-1

Page Range / eLocation ID:: 11731 to 11735

Format(s):: Medium: X

Location:: Seoul, Korea, Republic of

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP48485.2024.10446486

More Like this