skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Acoustic voice variation in spontaneous speech
This study replicates and extends the recent findings of Lee, Keating, and Kreiman [J. Acoust. Soc. Am. 146(3), 1568–1579 (2019)] on acoustic voice variation in read speech, which showed remarkably similar acoustic voice spaces for groups of female and male talkers and the individual talkers within these groups. Principal component analysis was applied to acoustic indices of voice quality measured from phone conversations for 99/100 of the same talkers studied previously. The acoustic voice spaces derived from spontaneous speech are highly similar to those based on read speech, except that unlike read speech, variability in fundamental frequency accounted for significant acoustic variability. Implications of these findings for prototype models of speaker recognition and discrimination are considered.  more » « less
Award ID(s):
1704167 2105410
PAR ID:
10367488
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Acoustical Society of America (ASA)
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
151
Issue:
5
ISSN:
0001-4966
Page Range / eLocation ID:
p. 3462-3472
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Native talkers are able to enhance acoustic characteristics of their speech in a speaking style known as “clear speech,” which is better understood by listeners than “plain speech.” However, despite substantial research in the area of clear speech, it is less clear whether non-native talkers of various proficiency levels are able to adopt a clear speaking style and if so, whether this style has perceptual benefits for native listeners. In the present study, native English listeners evaluated plain and clear speech produced by three groups: native English talkers, non-native talkers with lower proficiency, and non-native talkers with higher proficiency. Listeners completed a transcription task (i.e., an objective measure of the speech intelligibility). We investigated intelligibility as a function of language background and proficiency and also investigated the acoustic modifications that are associated with these perceptual benefits. The results of the study suggest that both native and non-native talkers modulate their speech when asked to adopt a clear speaking style, but that the size of the acoustic modifications, as well as consequences of this speaking style for perception differ as a function of language background and language proficiency. 
    more » « less
  2. Purpose:This study examined the race identification of Southern American English speakers from two geographically distant regions in North Carolina. The purpose of this work is to explore how talkers' self-identified race, talker dialect region, and acoustic speech variables contribute to listener categorization of talker races. Method:Two groups of listeners heard a series of /h/–vowel–/d/ (/hVd/) words produced by Black and White talkers from East and West North Carolina, respectively. Results:Both Southern (North Carolina) and Midland (Indiana) listeners accurately categorized the race of all speakers with greater-than-chance accuracy; however, Western North Carolina Black talkers were categorized with the lowest accuracy, just above chance. Conclusions:The results suggest that similarities in the speech production patterns of West North Carolina Black and White talkers affect the racial categorization of Black, but not White talkers. The results are discussed with respect to the acoustic spectral features of the voices present in the sample population. 
    more » « less
  3. This study compares human speaker discrimination performance for read speech versus casual conversations and explores differences between unfamiliar voices that are “easy” versus “hard” to “tell together” versus “tell apart.” Thirty listeners were asked whether pairs of short style-matched or -mismatched, text-independent utterances represented the same or different speakers. Listeners performed better when stimuli were style-matched, particularly in read speech−read speech trials (equal error rate, EER, of 6.96% versus 15.12% in conversation–conversation trials). In contrast, the EER was 20.68% for the style-mismatched condition. When styles were matched, listeners' confidence was higher when speakers were the same versus different; however, style variation caused decreases in listeners' confidence for the “same speaker” trials, suggesting a higher dependency of this task on within-speaker variability. The speakers who were “easy” or “hard” to “tell together” were not the same as those who were “easy” or “hard” to “tell apart.” Analysis of speaker acoustic spaces suggested that the difference observed in human approaches to “same speaker” and “different speaker” tasks depends primarily on listeners' different perceptual strategies when dealing with within- versus between-speaker acoustic variability. 
    more » « less
  4. Online data collection allows for access to diverse populations. In the current study, we used online recruitment and data collection methods to obtain a corpus of read speech from adult talkers representing three authentic regional dialects of American English and one novel dialect created for the corpus. The authentic dialects (New England, Northern, and Southern American English) are each represented by 8–10 talkers, ranging in age from 22 to 75 years old. The novel dialect was produced by five Spanish-English bilinguals with training in linguistics, who were asked to produce Spanish /o/ in an otherwise English segmental context. One vowel contrast was selected for each dialect, in which the vowels within the contrast are acoustically more similar in the target dialect than in the other dialects. Each talker produced one familiar short story with 40 tokens of each vowel within the target contrast for their dialect, as well as a set of real words and nonwords that represent both the target vowel contrast for their dialect and the other three vowel contrasts for comparison across dialects. Preliminary acoustic analysis reveals both cross-dialect and within-dialect variability in the target vowel contrasts. The corpus materials are available to the scholarly community. 
    more » « less
  5. null (Ed.)
    Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction. 
    more » « less