skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Measuring Gradient Effects of Alcohol on Speech with Neural Networks’ Posterior Probability of Phonological Features
Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete closure of the vocal tract can result in deaffrication of the English affricate sounds /tʃ/ and /ʤ/, spirantization (fricative-like production) of the stop consonants and palatalization (retraction of place of articulation) of the alveolar fricative /s/ (produced as /ʃ/). Such categorical segmental errors have been well-reported. This study employs a phonologicallyinformed neural network approach to estimate degrees of deaffrication of /tʃ/ and /ʤ/, spirantization of /t/ and /d/ and place retraction for /s/ in a corpus of intoxicated English speech. Recurrent neural networks were trained to recognize relevant phonological features [anterior], [continuant] and [strident] in a control speech corpus. Their posterior probabilities were computed over the segments produced under intoxication. The results obtained revealed both categorical and gradient errors and, thus, suggested that this new approach could reliably quantify fine-grained errors in intoxicated speech. Keywords: alcohol, deaffrication, palatalization, retraction, neural network.  more » « less
Award ID(s):
2037266
PAR ID:
10444293
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Skarnitzl, Radek
Date Published:
Journal Name:
Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023
Page Range / eLocation ID:
863-867
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Skarnitzl, Radek (Ed.)
    Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete closure of the vocal tract can result in deaffrication of the English affricate sounds /tʃ/ and /ʤ/, spirantization (fricative-like production) of the stop consonants and palatalization (retraction of place of articulation) of the alveolar fricative /s/ (produced as /ʃ/). Such categorical segmental errors have been well-reported. This study employs a phonologicallyinformed neural network approach to estimate degrees of deaffrication of /tʃ/ and /ʤ/, spirantization of /t/ and /d/ and place retraction for /s/ in a corpus of intoxicated English speech. Recurrent neural networks were trained to recognize relevant phonological features [anterior], [continuant] and [strident] in a control speech corpus. Their posterior probabilities were computed over the segments produced under intoxication. The results obtained revealed both categorical and gradient errors and, thus, suggested that this new approach could reliably quantify fine-grained errors in intoxicated speech. 
    more » « less
  2. The retraction of /s/ in /str/, eg street, is a sound change found in certain English dialects. Previous work suggests that /s/-retraction arises from lower spectral frequency /s/ in /str/. The extent to which /s/-retraction differs across English dialects is unclear. This paper presents results from a large-scale, acoustic phonetic study of sibilants in 420 speakers, from 6 spontaneous speech corpora (9 dialects) of North American and Scottish English. Spectral Centre of Gravity was modelled from automatic measures of word-initial sibilants. Female speakers show higher frequency sibilants than males, but more so for /s/ than /ʃ/; /s/ is also higher in American than Canadian/Scottish dialects; /ʃ/ is surprisingly variable. /s/-retraction, modelled as retraction ratios, is generally greater for /str/ than /spr skr/, but varies by dialect; females show more retraction in /str/ than males. Dialectal and social factors clearly influence /s/-retraction in English clusters /sp st sk/, /spr skr/, and /str/. 
    more » « less
  3. An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measures, is introduced. The method computes sample-to-sample differences in F0 and, based on a user-defined threshold, determines whether a difference is larger than naturally produced F0 velocities, thus, flagging it as an error. Various parameter settings are evaluated on a corpus of 30 American English speakers producing different intonational patterns, for which F0 tracking errors were manually checked. The paper concludes in recommending settings for the algorithm and ways in which it can be used to facilitate analyses of F0 in speech research. 
    more » « less
  4. Hwang, Gwo-Jen; Xie, Haoran; Wah, Benjamin; Gasevic, Dragan (Ed.)
    Classroom videos are a common source of data for educational researchers studying classroom interactions as well as a resource for teacher education and professional development. Over the last several decades emerging technologies have been applied to classroom videos to record, transcribe, and analyze classroom interactions. With the rise of machine learning, we report on the development and validation of neural networks to classify instructional activities using video signals, without analyzing speech or audio features, from a large corpus of nearly 250 h of classroom videos from elementary mathematics and English language arts instruction. Results indicated that the neural networks performed fairly-well in detecting instructional activities, at diverse levels of complexity, as compared to human raters. For instance, one neural network achieved over 80% accuracy in detecting four common activity types: whole class activity, small group activity, individual activity, and transition. An issue that was not addressed in this study was whether the fine-grained and agnostic instructional activities detected by the neural networks could scale up to supply information about features of instructional quality. Future applications of these neural networks may enable more efficient cataloguing and analysis of classroom videos at scale and the generation of fine-grained data about the classroom environment to inform potential implications for teaching and learning. 
    more » « less
  5. The way listeners perceive speech sounds is largely determined by the language(s) they were exposed to as a child. For example, native speakers of Japanese have a hard time discriminating between American English /ɹ/ and /l/, a phonetic contrast that has no equivalent in Japanese. Such effects are typically attributed to knowledge of sounds in the native language, but quantitative models of how these effects arise from linguistic knowledge are lacking. One possible source for such models is Automatic Speech Recognition (ASR) technology. We implement models based on two types of systems from the ASR literature—hidden Markov models (HMMs) and the more recent, and more accurate, neural network systems—and ask whether, in addition to showing better performance, the neural network systems also provide better models of human perception. We find that while both types of systems can account for Japanese natives’ difficulty with American English /ɹ/ and /l/, only the neural network system successfully accounts for Japanese natives’ facility with Japanese vowel length contrasts. Our work provides a new example, in the domain of speech perception, of an often observed correlation between task performance and similarity to human behavior. 
    more » « less