skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Guiding Students to Investigate What Google Speech Recognition Knowsabout Language
Children of all ages interact with speech recognition systems but are largely unaware of how they work. Teaching K-12 students to investigate how these systems employ phonolog- ical, syntactic, semantic, and cultural knowledge to resolve ambiguities in the audio signal can provide them a window on complex AI decision-making and also help them appreciate the richness and complexity of human language. We describe a browser-based tool for exploring the Google Web Speech API and a series of experiments students can engage in to measure what the service knows about language and the types of biases it exhibits. Middle school students taking an introductory AI elective were able to use the tool to explore Google’s knowledge of homophones and its ability to exploit context to disambiguate them. Older students could potentially conduct more comprehensive investigations, which we lay out here. This approach to investigating the power and limitations of speech technology through carefully designed experiments can also be applied to other AI application areas, such as face detection, object recognition, machine translation, or question answering.  more » « less
Award ID(s):
2048502
PAR ID:
10409044
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Thirteenth AAAI Symposium on Educational Advances in Artificial Intelligence .
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neumann, M.; Virtue, P.; Guerzhoy, M. (Ed.)
    Children of all ages interact with speech recognition systems but are largely unaware of how they work. Teaching K-12 students to investigate how these systems employ phonological, syntactic, semantic, and cultural knowledge to resolve ambiguities in the audio signal can provide them a window on complex AI decision-making and also help them appreciate the richness and complexity of human language. We describe a browser-based tool for exploring the Google Web Speech API and a series of experiments students can engage in to measure what the service knows about language and the types of biases it exhibits. Middle school students taking an introductory AI elective were able to use the tool to explore Google’s knowledge of homophones and its ability to exploit context to disambiguate them. Older students could potentially conduct more comprehensive investigations, which we lay out here. This approach to investigating the power and limitations of speech technology through carefully designed experiments can also be applied to other AI application areas, such as face detection, object recognition, machine translation, or question answering. 
    more » « less
  2. The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior. 
    more » « less
  3. Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children’s speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs. 
    more » « less
  4. High-Performance Computing (HPC) is increasingly being used in traditional scientific domains as well as emerging areas like Deep Learning (DL). This has led to a diverse set of professionals who interact with state-of-the-art HPC systems. The deployment of Science Gateways for HPC systems like Open On-Demand has a significant positive impact on these users in migrating their workflows to HPC systems. Although computing capabilities are ubiquitously available (as on-premises or in the cloud HPC infrastructure), significant effort and expertise are required to use them effectively. This is particularly challenging for domain scientists and other users whose primary expertise lies outside of computer science. In this paper, we seek to minimize the steep learning curve and associated complexities of using state-of-the-art high-performance systems by creating SAI: an AI-Enabled Speech Assistant Interface for Science Gateways in High Performance Computing. We use state-of-the-art AI models for speech and text and fine-tune them for the HPC arena by retraining them on a new HPC dataset we create. We use ontologies and knowledge graphs to capture the complex relationships between various components of the HPC ecosystem. We finally show how one can integrate and deploy SAI in Open OnDemand and evaluate its functionality and performance on real HPC systems. To the best of our knowledge, this is the first effort aimed at designing and developing an AI-powered speech-assisted interface for science gateways in HPC. 
    more » « less
  5. Blikstein, P.; Van Aalst, J.; Kizito, R.; Brennan, K. (Ed.)
    Past research shows that teacher noticing matters for student learning, but little is known about the effects of AI-based tools designed to augment teachers’ attention and sensemaking. In this paper, we investigate three multimodal measures of teacher noticing (i.e., gaze, deep dive into learning analytics in a teacher tool, and visits to individual students), gleaned from a mixed reality teacher awareness tool across ten classrooms. Our analysis suggests that of the three noticing measures, deep dive exhibited the largest association with learning gains when adjusting for students’ prior knowledge and tutor interactions. This finding may indicate that teachers identified students most in need based on the deep dive analytics and offered them support. We discuss how these multimodal measures can make the constraints and effects of teacher noticing in human-AI partnered classrooms visible. 
    more » « less