Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions

Dutta, Satwik; López-Espejo, Iván; Irvin, Dwight; Hansen, John_H L

doi:10.21437/odyssey.2024-12

Citation Details

Joint Language and Speaker Classification in Naturalistic Bilingual Adult-Toddler Interactions

Bilingual children at a young age can benefit from exposure to dual language, impacting their language and literacy development. Speech technology can aid in developing tools to accurately quantify children’s exposure to multiple languages, thereby helping parents, teachers, and early-childhood practitioners to better support bilingual children. This study lays the foundation towards this goal using the Hoff corpus containing naturalistic adult-child bilingual interactions collected at child ages 2½, 3, and 3½ years. Exploiting self-supervised learning features from XLSR-53 and HuBERT, we jointly predict the language (English/Spanish) and speaker (adult/child) in each utterance using a multi-task learning approach. Our experiments indicate that a trainable linear combination of embeddings across all Transformer layers of the SSL models is a stronger indicator for both tasks with more benefit to speaker classification. However, language classification for children remains challenging. more »

Award ID(s):: 2234916

PAR ID:: 10610081

Author(s) / Creator(s):: Dutta, Satwik; López-Espejo, Iván; Irvin, Dwight; Hansen, John_H L

Publisher / Repository:: ISCA

Date Published:: 2024-06-18

Page Range / eLocation ID:: 81 to 85

Subject(s) / Keyword(s):: Child speech processing Bilingual adult-child speaker diarization language recognition speaker recognition

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/odyssey.2024-12

More Like this