skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Longitudinal study of voice recognition in children
Speaker recognition as a biometric modality is on the rise in the consumer marketplace for banking, online services, and personal assistant services with a potential for wider application areas. Most current applications involve adults. One of the biggest challenges in speaker recognition for children is the change in the voice properties as a child age. This work proposes a baseline longitudinal dataset from the same 30 children in the age group of 4 to 14 years over a time frame of 2.5 years and evaluates speaker recognition performance in children with the available speaker recognition technology.  more » « less
Award ID(s):
1650503
PAR ID:
10216343
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
GIEdition
ISSN:
1617-5468
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The dilation of the pupil and it’s variation between a mated pair of irides has been found to be an important factor in the performance of iris recognition systems. Studies on adult irides indicated significant impact of dilation on iris recognition performance at different ages. However, the results of adults may not necessarily translate to children. This study analyzes dilation as a factor of age and over time in children, from data collected from same 209 subjects in the age group of four to 11 years at enrollment, longitudinally over three years spaced by six months. The performance of iris recognition is also analyzed in presence of dilation variation. 
    more » « less
  2. Bilingual children at a young age can benefit from exposure to dual language, impacting their language and literacy development. Speech technology can aid in developing tools to accurately quantify children’s exposure to multiple languages, thereby helping parents, teachers, and early-childhood practitioners to better support bilingual children. This study lays the foundation towards this goal using the Hoff corpus containing naturalistic adult-child bilingual interactions collected at child ages 2½, 3, and 3½ years. Exploiting self-supervised learning features from XLSR-53 and HuBERT, we jointly predict the language (English/Spanish) and speaker (adult/child) in each utterance using a multi-task learning approach. Our experiments indicate that a trainable linear combination of embeddings across all Transformer layers of the SSL models is a stronger indicator for both tasks with more benefit to speaker classification. However, language classification for children remains challenging. 
    more » « less
  3. Abstract This study compares how English-speaking adults and children from the United States adapt their speech when talking to a real person and a smart speaker (Amazon Alexa) in a psycholinguistic experiment. Overall, participants produced more effortful speech when talking to a device (longer duration and higher pitch). These differences also varied by age: children produced even higher pitch in device-directed speech, suggesting a stronger expectation to be misunderstood by the system. In support of this, we see that after a staged recognition error by the device, children increased pitch even more. Furthermore, both adults and children displayed the same degree of variation in their responses for whether “Alexa seems like a real person or not”, further indicating that children’s conceptualization of the system’s competence shaped their register adjustments, rather than an increased anthropomorphism response. This work speaks to models on the mechanisms underlying speech production, and human–computer interaction frameworks, providing support for routinized theories of spoken interaction with technology. 
    more » « less
  4. Speech and language development are early indicators of overall analytical and learning ability in children. The preschool classroom is a rich language environment for monitoring and ensuring growth in young children by measuring their vocal interactions with teachers and classmates. Early childhood researchers are naturally interested in analyzing naturalistic vs controlled lab recordings to measure both quality and quantity of such interactions. Unfortunately, present-day speech technologies are not capable of addressing the wide dynamic scenario of early childhood classroom settings. Due to the diversity of acoustic events/conditions in such daylong audio streams, automated speaker diarization technology would need to be advanced to address this challenging domain for segmenting audio as well as information extraction. This study investigates alternate deep learning-based lightweight, knowledge-distilled, diarization solutions for segmenting classroom interactions of 3–5 years old children with teachers. In this context, the focus on speech-type diarization which classifies speech segments as being either from adults or children partitioned across multiple classrooms. Our lightest CNN model achieves a best F1-score of ∼76.0% on data from two classrooms, based on dev and test sets of each classroom. It is utilized with automatic speech recognition-based re-segmentation modules to perform child-adult diarization. Additionally, F1-scores are obtained for individual segments with corresponding speaker tags (e.g., adult vs child), which provide knowledge for educators on child engagement through naturalistic communications. The study demonstrates the prospects of addressing educational assessment needs through communication audio stream analysis, while maintaining both security and privacy of all children and adults. The resulting child communication metrics have been used for broad-based feedback for teachers with the help of visualizations. 
    more » « less
  5. Biometric recognition allows a person to be identified by comparing feature vectors derived from a person's physiological characteristics. Recognition is dependent on the permanence of the biometric characteristics over long periods of time. There was been limited work evaluating the footprint as a potential biometric. This paper presents a longitudinal study of toe prints in children to understand if this biometric modality could be used reliably as a child grows. Data was collected and analyzed in children ages 4-13 years over five visits, spaced approximately six months apart, giving two years of data. This is the first footprint collection spanning this broad age range in children. Footprints were segmented into separate toe prints to examine whether current fingerprint recognition technology can provide accurate results on toe prints. Data was analyzed using two available fingerprint matchers, Verifinger and Bo-zorth3 from NIST Biometric Image Software (NBIS). Ver-ifinger provides the best verification match scores using the toe prints, especially when using the hallux, the large toe. The hallux toe on Verifinger provides verification rates of 0% FAR and FRR for images collected on the same day and a FRR of 6.44% at a 1% FAR after two years have passed between collections. Additional longitudinal data is being collected to further these results. 
    more » « less