Playlogue: Dataset and Benchmarks for Analyzing Adult-Child Conversations During Play

Kalanadhabhatta, Manasa; Rastikerdar, Mohammad Mehdi; Rahman, Tauhidur; Grabell, Adam S; Ganesan, Deepak

doi:10.1145/3699775

There has been growing interest in developing ubiquitous technologies to analyze adult-child speech in naturalistic settings such as free play in order to support children's social and academic development, language acquisition, and parent-child interactions. However, these technologies often rely on off-the-shelf speech processing tools that have not been evaluated on child speech or child-directed adult speech, whose unique characteristics might result in significant performance gaps when using models trained on adult speech. This work introduces the Playlogue dataset containing over 33 hours of long-form, naturalistic, play-based adult-child conversations from three different corpora of preschool-aged children. Playlogue enables researchers to train and evaluate speaker diarization and automatic speech recognition models on child-centered speech. We demonstrate the lack of generalizability of existing state-of-the-art models when evaluated on Playlogue, and show how fine-tuning models on adult-child speech mitigates the performance gap to some extent but still leaves considerable room for improvement. We further annotate over 5 hours of the Playlogue dataset with 8668 validated adult and child speech act labels, which can be used to train and evaluate models to provide clinically relevant feedback on parent-child interactions. We investigate the performance of state-of-the-art language models at automatically predicting these speech act labels, achieving significant accuracy with simple chain-of-thought prompting or minimal fine-tuning. We use inhome pilot data to validate the generalizability of models trained on Playlogue, demonstrating its utility in improving speech and language technologies for child-centered conversations. The Playlogue dataset is available for download at https://huggingface.co/datasets/playlogue/playlogue-v1.

More Like this