skip to main content


Title: Capturing talk and proximity in the classroom: Advances in measuring features of young children’s friendships
Young children’s friendships fuel essential developmental outcomes (e.g., social-emotional competence) and are thought to provide even greater benefits to children with or at-risk for disabilities. Teacher and parent report and sociometric measures are commonly used to measure friendships, and ecobehavioral assessment has been used to capture its features on a momentary basis. In this proof-of-concept study, we use Ubisense, the Language ENvironmental Analysis (LENA) recorder, and advanced speech processing algorithms to capture features of friendship –child-peer speech and proximity within activity areas . We collected 12,332 1-second speech and location data points. Our preliminary results indicate the focal child at-risk for a disability and each playmate spent time vocalizing near one another across 4 activity areas. Additionally, compared to the Blocks activity area, the children had significantly lower odds of talking while in proximity during Manipulatives and Science. This suggests that the activity areas children occupy may affect their engagement with peers and, in turn, the friendships they development. The proposed approach is a groundbreaking advance to understanding and supporting children’s friendships.  more » « less
Award ID(s):
1918012 1918032
NSF-PAR ID:
10286965
Author(s) / Creator(s):
Date Published:
Journal Name:
Early childhood research quarterly
Volume:
57
ISSN:
0885-2006
Page Range / eLocation ID:
102-109
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Understanding and assessing child verbal communication patterns is critical in facilitating effective language development. Typically speaker diarization is performed to explore children’s verbal engagement. Understanding which activity areas stimulate verbal communication can help promote more efficient language development. In this study, we present a two stage children vocal engagement prediction system that consists of (1) a near to real-time, noise robust system that measures the duration of child-to-adult and child-to-child conversations, and tracks the number of conversational turn-takings, (2) a novel child location tracking strategy, that determines in which activity areas a child spends most/least of their time. A proposed child–adult turn-taking solution relies exclusively on vocal cues observed during the interaction between a child and other children, and/or classroom teachers. By employing a threshold optimized speech activity detection using a linear combination of voicing measures, it is possible to achieve effective speech/non-speech segment detection prior to conversion assessment. This TO-COMBO-SAD reduces classification error rates for adult-child audio by 21.34% and 27.3% compared to a baseline i-Vector and standard Bayesian Information Criterion diarization systems, respectively. In addition, this study presents a unique location tracking system adult-child that helps determine the quantity of child–adult communication in specific activity areas, and which activities stimulate voice communication engagement in a child–adult education space. We observe that our proposed location tracking solution offers unique opportunities to assess speech and language interaction for children, and quantify the location context which would contribute to improve verbal communication. 
    more » « less
  2. The use of wh-words, including wh-questions and wh-clauses, can be linguistically, conceptually, and interactively challenging to preschoolers. Young children develop mastery of wh-words as they formulate and hear these words during daily interactions in contexts such as preschool classrooms. Observational approaches limit researchers' ability to comprehensively capture the classroom conversations, including wh-words. In the current study, we report the results of the first study using the automated speech recognition (ASR) system coupled with location sensors designed to quantify teachers' wh-words in the literacy activity areas of a preschool classroom. We found that the ASR system is a viable solution to automatically quantify the number of adult wh-words used in preschool classrooms. Our findings demonstrated that the most frequently used adult wh-word type was "what." Classroom adults used more wh-words during time point 1 compared to time point 2. Lastly, a child at risk for developmental delays heard more wh-words per minute than a typically developing child. Future research is warranted to further improve the efforts 
    more » « less
  3. The ability to assess children’s conversational interaction is critical in determining language and cognitive proficiency for typically developing and at-risk children. The earlier at-risk child is identified, the earlier support can be provided to reduce the social impact of the speech disorder. To date, limited research has been performed for young child speech recognition in classroom settings. This study addresses speech recognition research with naturalistic children’s speech, where age varies from 2.5 to 5 years. Data augmentation is relatively under explored for child speech. Therefore, we investigate the effectiveness of data augmentation techniques to improve both language and acoustic models. We explore alternate text augmentation approaches using adult data, Web data, and via text generated by recurrent neural networks. We also compare several acoustic augmentation techniques: speed perturbation, tempo perturbation, and adult data. Finally, we comment on child word count rates to assess child speech development. 
    more » « less
  4. Although non-profit commercial products such as LENA can provide valuable feedback to parents and early childhood educators about their children’s or student’s daily communication interactions, their cost and technology requirements put them out of reach of many families who could benefit. Over the last two decades, smartphones have become commonly used in most households irrespective of their socio-economic background. In this study, conducted during the COVID-19 pandemic, we aim to compare audio collected on LENA recorders versus smartphones available to families in an unsupervised data collection protocol. Approximately 10 hours of audio evaluated in this study was collected by three families in their homes during parent-child science book reading activities with their children. We report comparisons and found similar performance between the two audio capture devices based on their speech signal-tonoise ratio (NIST STNR) and word-error-rates calculated using automatic speech recognition (ASR) engines. Finally, we discuss implications of this study for expanding this technology to more diverse populations, limitations and future directions. 
    more » « less
  5. Autonomous educational social robots can be used to help promote literacy skills in young children. Such robots, which emulate the emotive, perceptual, and empathic abilities of human teachers, are capable of replicating some of the benefits of one-on-one tutoring from human teachers, in part by leveraging individual student’s behavior and task performance data to infer sophisticated models of their knowledge. These student models are then used to provide personalized educational experiences by, for example, determining the optimal sequencing of curricular material. In this paper, we introduce an integrated system for autonomously analyzing and assessing children’s speech and pronunciation in the context of an interactive word game between a social robot and a child. We present a novel game environment and its computational formulation, an integrated pipeline for capturing and analyzing children’s speech in real-time, and an autonomous robot that models children’s word pronunciation via Gaussian Process Regression (GPR), augmented with an Active Learning protocol that informs the robot’s behavior. We show that the system is capable of autonomously assessing children’s pronunciation ability, with ground truth determined by a post-experiment evaluation by human raters. We also compare phoneme- and word-level GPR models and discuss trade-offs of each approach in modeling children’s pronunciation. Finally, we describe and analyze a pipeline for automatic analysis of children’s speech and pronunciation, including an evaluation of Speech Ace as a tool for future development of autonomous, speech-based language tutors. 
    more » « less