Abstract Perception changes rapidly and implicitly as a function of passive exposure to speech that samples different acoustic distributions. Past research has shown that this statistical learning generalizes across talkers and, to some extent, new items, but these studies involved listeners’ active engagement in processing statistics-bearing stimuli. In this study, we manipulated the relationship between voice onset time (VOT) and fundamental frequency (F0) to establish distributional regularities either aligned with American English or reversed to create a subtle foreign accent. We then tested whether statistical learning across passive exposure to these distributions generalized to new items never experienced in the accent. Experiment 1 showed statistical learning across passive exposure but no generalization of learning when exposure and test items shared the same initial consonant but differed in vowels (bear/pear → beer/pier) or when they differed in initial consonant but shared distributional regularities across VOT and F0 dimensions (deer/tear → beer/pier). Experiment 2 showed generalization to stimuli that shared the statistics-bearing phoneme (bear/pear → beer/pier), but only when the response set included tokens from both exposure and generalization stimuli. Moreover, statistical learning transferred to influence the subtle acoustics of listeners’ own speech productions but did not generalize to influence productions of stimuli not heard in the accent. In sum, passive exposure is thus sufficient to support statistical learning and its generalization, but task demands modulate this dynamic. Moreover, production does not simply mirror perception: generalization in perception was not accompanied by transfer to production.
more »
« less
Transfer of statistical learning from passive speech perception to speech production
Abstract Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions.
more »
« less
- Award ID(s):
- 2346989
- PAR ID:
- 10470948
- Publisher / Repository:
- Springer Science + Business Media
- Date Published:
- Journal Name:
- Psychonomic Bulletin & Review
- Volume:
- 31
- Issue:
- 3
- ISSN:
- 1069-9384
- Format(s):
- Medium: X Size: p. 1193-1205
- Size(s):
- p. 1193-1205
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Purpose The “bubble noise” technique has recently been introduced as a method to identify the regions in time–frequency maps (i.e., spectrograms) of speech that are especially important for listeners in speech recognition. This technique identifies regions of “importance” that are specific to the speech stimulus and the listener, thus permitting these regions to be compared across different listener groups. For example, in cross-linguistic and second-language (L2) speech perception, this method identifies differences in regions of importance in accomplishing decisions of phoneme category membership. This research note describes the application of bubble noise to the study of language learning for 3 different language pairs: Hindi English bilinguals' perception of the /v/–/w/ contrast in American English, native English speakers' perception of the tense/lax contrast for Korean fricatives and affricates, and native English speakers' perception of Mandarin lexical tone. Conclusion We demonstrate that this technique provides insight on what information in the speech signal is important for native/first-language listeners compared to nonnative/L2 listeners. Furthermore, the method can be used to examine whether L2 speech perception training is effective in bringing the listener's attention to the important cues.more » « less
-
Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance.more » « less
-
In Autosegmental-Metrical models of intonational phonology, different types of pitch accents, phrase accents, and boundary tones concatenate to create a set of phonologically distinct phrase-final nuclear tunes. This study asks if an eight-way distinction in nuclear tune shape in American English, predicted from the combination of two (monotonal) pitch accents, two phrase accents, and two boundary tones, is evident in speech production and in speech perception. F0 trajectories from a large-scale imitative speech production experiment were analyzed using bottom-up(k-means) clustering, neural net classification, GAMM modeling, and modeling of turning point alignment. Listeners’ perception of the same tunes is tested in a perceptual discrimination task and related to the imitation results. Emergent grouping of tunes in the clustering analysis, and related classification accuracy from the neural net, show a merging of some of the predicted distinctions among tunes whereby tune shapes that vary primarily in the scaling of final f0 are not reliably distinguished. Within five emergent clusters, subtler distinctions among tunes are evident in GAMMs and f0 turning point modeling. Clustering of individual participants’ production data shows a range of partitions of the data, with nearly all participants making a primary distinction between a class of High-Rising and Non-High-Rising tunes, and with up to four secondary distinctions among the non-Rising class. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0, and highly accurate discrimination when just one member of a pair is in the High-Rising tune class. Together, the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape and poorly differentiated distinctions between tunes with the same holistic shape but small differences in final f0. The observed distinctions from clustering, classification, and perception analyses align with the tonal specification of a binary pitch accent contrast {H*, L*} and a maximally ternary {H%, M%, L%} boundary tone contrast; the findings do not support distinct tonal specifications for the phrase accent and boundary tone from the AM model.more » « less
-
Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic–phonetic models in more proficient listeners. Significance StatementBehavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent.more » « less
An official website of the United States government
