Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In the
- Award ID(s):
- 1827409
- NSF-PAR ID:
- 10440823
- Date Published:
- Journal Name:
- Frontiers in Psychology
- Volume:
- 13
- ISSN:
- 1664-1078
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract canonical condition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereverse condition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions. -
This study investigates the integration of word-initial fundamental frequency (F0) and voice-onset-time (VOT) in stop voicing categorization for adult listeners with normal hearing (NH) and unilateral cochlear implant (CI) recipients utilizing a bimodal hearing configuration [CI + contralateral hearing aid (HA)]. Categorization was assessed for ten adults with NH and ten adult bimodal listeners, using synthesized consonant stimuli interpolating between /ba/ and /pa/ exemplars with five-step VOT and F0 conditions. All participants demonstrated the expected categorization pattern by reporting /ba/ for shorter VOTs and /pa/ for longer VOTs, with NH listeners showing more use of VOT as a voicing cue than CI listeners in general. When VOT becomes ambiguous between voiced and voiceless stops, NH users make more use of F0 as a cue to voicing than CI listeners, and CI listeners showed greater utilization of initial F0 during voicing identification in their bimodal (CI + HA) condition than in the CI-alone condition. The results demonstrate the adjunctive benefit of acoustic hearing from the non-implanted ear for listening conditions involving spectrotemporally complex stimuli. This finding may lead to the development of a clinically feasible perceptual weighting task that could inform clinicians about bimodal efficacy and the risk-benefit profile associated with bilateral CI recommendation.
-
Abstract Recent studies have documented substantial variability among typical listeners in how gradiently they categorize speech sounds, and this variability in categorization gradience may link to how listeners weight different cues in the incoming signal. The present study tested the relationship between categorization gradience and cue weighting across two sets of English contrasts, each varying orthogonally in two acoustic dimensions. Participants performed a four‐alternative forced‐choice identification task in a visual world paradigm while their eye movements were monitored. We found that (a) greater categorization gradience derived from behavioral identification responses corresponds to larger secondary cue weights derived from eye movements; (b) the relationship between categorization gradience and secondary cue weighting is observed across cues and contrasts, suggesting that categorization gradience may be a consistent within‐individual property in speech perception; and (c) listeners who showed greater categorization gradience tend to adopt a buffered processing strategy, especially when cues arrive asynchronously in time.
-
Abstract Music and language are two fundamental forms of human communication. Many studies examine the development of music‐ and language‐specific knowledge, but few studies compare how listeners know they are listening to music or language. Although we readily differentiate these domains, how we distinguish music and language—and especially speech and song— is not obvious. In two studies, we asked how listeners categorize speech and song. Study 1 used online survey data to illustrate that 4‐ to 17‐year‐olds and adults have verbalizable distinctions for speech and song. At all ages, listeners described speech and song differences based on acoustic features, but compared with older children, 4‐ to 7‐year‐olds more often used volume to describe differences, suggesting that they are still learning to identify the features most useful for differentiating speech from song. Study 2 used a perceptual categorization task to demonstrate that 4–8‐year‐olds and adults readily categorize speech and song, but this ability improves with age especially for identifying song. Despite generally rating song as more speech‐like, 4‐ and 6‐year‐olds rated ambiguous speech–song stimuli as more song‐like than 8‐year‐olds and adults. Four acoustic features predicted song ratings: F0 instability, utterance duration, harmonicity, and spectral flux. However, 4‐ and 6‐year‐olds’ song ratings were better predicted by F0 instability than by harmonicity and utterance duration. These studies characterize how children develop conceptual and perceptual understandings of speech and song and suggest that children under age 8 are still learning what features are important for categorizing utterances as speech or song.
Research Highlights Children and adults conceptually and perceptually categorize speech and song from age 4.
Listeners use F0 instability, harmonicity, spectral flux, and utterance duration to determine whether vocal stimuli sound like song.
Acoustic cue weighting changes with age, becoming adult‐like at age 8 for perceptual categorization and at age 12 for conceptual differentiation.
Young children are still learning to categorize speech and song, which leaves open the possibility that music‐ and language‐specific skills are not so domain‐specific.
-
null (Ed.)Speech alignment is where talkers subconsciously adopt the speech and language patterns of their interlocutor. Nowadays, people of all ages are speaking with voice-activated, artificially-intelligent (voice-AI) digital assistants through phones or smart speakers. This study examines participants’ age (older adults, 53–81 years old vs. younger adults, 18–39 years old) and gender (female and male) on degree of speech alignment during shadowing of (female and male) human and voice-AI (Apple’s Siri) productions. Degree of alignment was assessed holistically via a perceptual ratings AXB task by a separate group of listeners. Results reveal that older and younger adults display distinct patterns of alignment based on humanness and gender of the human model talkers: older adults displayed greater alignment toward the female human and device voices, while younger adults aligned to a greater extent toward the male human voice. Additionally, there were other gender-mediated differences observed, all of which interacted with model talker category (voice-AI vs. human) or shadower age category (OA vs. YA). Taken together, these results suggest a complex interplay of social dynamics in alignment, which can inform models of speech production both in human-human and human-device interaction.more » « less