skip to main content


Title: Challenges remain in Building ASR for Spontaneous Preschool Children Speech in Naturalistic Educational Environments
Monitoring child development in terms of speech/language skills has a long-term impact on their overall growth. As student diversity continues to expand in US classrooms, there is a growing need to benchmark social-communication engagement, both from a teacher-student perspective, as well as student-student content. Given various challenges with direct observation, deploying speech technology will assist in extracting meaningful information for teachers. These will help teachers to identify and respond to students in need, immediately impacting their early learning and interest. This study takes a deep dive into exploring various hybrid ASR solutions for low-resource spontaneous preschool (3-5yrs) children (with & without developmental delays) speech, being involved in various activities, and interacting with teachers and peers in naturalistic classrooms. Various out-of-domain corpora over a wide and limited age range, both scripted and spontaneous were considered. Acoustic models based on factorized TDNNs infused with Attention, and both N-gram and RNN language models were considered. Results indicate that young children have significantly different/ developing articulation skills as compared to older children. Out-of-domain transcripts of interactions between young children and adults however enhance language model performance. Overall transcription of such data, including various non-linguistic markers, poses additional challenges.  more » « less
Award ID(s):
1918032
NSF-PAR ID:
10362772
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
ISCA INTERSPEECH-2022
Page Range / eLocation ID:
4322 to 4326
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs. controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates an alternate Deep Learning-based diarization solution for segmenting classroom interactions of 3-5 year old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our proposed ResNet model achieves a best F1-score of ∼78.0% on data from two classrooms, based on dev and test sets of each classroom. It is utilized with Automatic Speech Recognition-based resegmentation modules to perform child-adult diarization. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs. child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations. 
    more » « less
  2. Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs. controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates an alternate Deep Learning-based diarization solution for segmenting classroom interactions of 3-5 year old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our proposed ResNet model achieves a best F1-score of ∼71.0% on data from two classrooms, based on dev and test sets of each classroom. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs. child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations. 
    more » « less
  3. Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with both teachers and classmates. Early childhood researchers recognize the importance in analyzing naturalistic vs. controlled lab recordings to measure both quality and quantity of child interactions. Recently, large language model-based speech technologies have performed well on conversational speech recognition. In this regard, we assess performance of such models on the wide dynamic scenario of early childhood classroom settings. This study investigates an alternate Deep Learning-based Teacher-Student learning solution for recognizing adult speech within preschool interactions. Our proposed adapted model achieves the best F1-score for recognizing most frequent 400 words on test sets for both classrooms. Additionally, F1-scores for alternate word groups provides a breakdown of performance across relevant language-based word-categories. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics from this study can also be used for broad-based feedback for teachers. 
    more » « less
  4. Adult-child interaction is an important component for language development in young children. Teachers responsible for the language acquisition of their students have a vested interest in improving such conversation in their classrooms. Advancements in speech technology and natural language processing can be used as an effective tool by teachers in pre-school classrooms to acquire large amounts of conversational data, receive feedback from automated conversational analysis, and amend their teaching methods. Measuring engagement among pre-school children and teachers is a challenging task and not well defined. In this study, we focus on developing criteria to measure conversational turn-taking and topic initiation during adult-child interactions in preschool environments. However, counting conversational turns, conversation initiations, or vocabulary alone is not enough to judge the quality of a conversation and track language acquisition. It is necessary to use a combination of the three and include a measurement of the complexity of vocabulary. The next iterative of this problem is to deploy various solutions from speech and language processing technology to automate these measurements. * (2022 ASEE Best Student Paper Award Winner) 
    more » « less
  5. The Next Generation Science Standards [1] recognized evidence-based argumentation as one of the essential skills for students to develop throughout their science and engineering education. Argumentation focuses students on the need for quality evidence, which helps to develop their deep understanding of content [2]. Argumentation has been studied extensively, both in mathematics and science education but also to some extent in engineering education (see for example [3], [4], [5], [6]). After a thorough search of the literature, we found few studies that have considered how teachers support collective argumentation during engineering learning activities. The purpose of this program of research was to support teachers in viewing argumentation as an important way to promote critical thinking and to provide teachers with tools to implement argumentation in their lessons integrating coding into science, technology, engineering, and mathematics (which we refer to as integrative STEM). We applied a framework developed for secondary mathematics [7] to understand how teachers support collective argumentation in integrative STEM lessons. This framework used Toulmin’s [8] conceptualization of argumentation, which includes three core components of arguments: a claim (or hypothesis) that is based on data (or evidence) accompanied by a warrant (or reasoning) that relates the data to the claim [9], [8]. To adapt the framework, video data were coded using previously established methods for analyzing argumentation [7]. In this paper, we consider how the framework can be applied to an elementary school teacher’s classroom interactions and present examples of how the teacher implements various questioning strategies to facilitate more productive argumentation and deeper student engagement. We aim to understand the nature of the teacher’s support for argumentation—contributions and actions from the teacher that prompt or respond to parts of arguments. In particular, we look at examples of how the teacher supports students to move beyond unstructured tinkering (e.g., trial-and-error) to think logically about coding and develop reasoning for the choices that they make in programming. We also look at the components of arguments that students provide, with and without teacher support. Through the use of the framework, we are able to articulate important aspects of collective argumentation that would otherwise be in the background. The framework gives both eyes to see and language to describe how teachers support collective argumentation in integrative STEM classrooms. 
    more » « less