skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Lexical Information Guides Retuning of Neural Patterns in Perceptual Learning for Speech
Abstract A listener's interpretation of a given speech sound can vary probabilistically from moment to moment. Previous experience (i.e., the contexts in which one has encountered an ambiguous sound) can further influence the interpretation of speech, a phenomenon known as perceptual learning for speech. This study used multivoxel pattern analysis to query how neural patterns reflect perceptual learning, leveraging archival fMRI data from a lexically guided perceptual learning study conducted by Myers and Mesite [Myers, E. B., & Mesite, L. M. Neural systems underlying perceptual adjustment to non-standard speech tokens. Journal of Memory and Language, 76, 80–93, 2014]. In that study, participants first heard ambiguous /s/–/∫/ blends in either /s/-biased lexical contexts (epi_ode) or /∫/-biased contexts (refre_ing); subsequently, they performed a phonetic categorization task on tokens from an /asi/–/a∫i/ continuum. In the current work, a classifier was trained to distinguish between phonetic categorization trials in which participants heard unambiguous productions of /s/ and those in which they heard unambiguous productions of /∫/. The classifier was able to generalize this training to ambiguous tokens from the middle of the continuum on the basis of individual participants' trial-by-trial perception. We take these findings as evidence that perceptual learning for speech involves neural recalibration, such that the pattern of activation approximates the perceived category. Exploratory analyses showed that left parietal regions (supramarginal and angular gyri) and right temporal regions (superior, middle, and transverse temporal gyri) were most informative for categorization. Overall, our results inform an understanding of how moment-to-moment variability in speech perception is encoded in the brain.  more » « less
Award ID(s):
1735225
PAR ID:
10281105
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Cognitive Neuroscience
Volume:
32
Issue:
10
ISSN:
0898-929X
Page Range / eLocation ID:
2001 to 2012
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Though the right hemisphere has been implicated in talker processing, it is thought to play a minimal role in phonetic processing, at least relative to the left hemisphere. Recent evidence suggests that the right posterior temporal cortex may support learning of phonetic variation associated with a specific talker. In the current study, listeners heard a male talker and a female talker, one of whom produced an ambiguous fricative in /s/-biased lexical contexts (e.g., epi?ode) and one who produced it in /∫/-biased contexts (e.g., friend?ip). Listeners in a behavioral experiment (Experiment 1) showed evidence of lexically guided perceptual learning, categorizing ambiguous fricatives in line with their previous experience. Listeners in an fMRI experiment (Experiment 2) showed differential phonetic categorization as a function of talker, allowing for an investigation of the neural basis of talker-specific phonetic processing, though they did not exhibit perceptual learning (likely due to characteristics of our in-scanner headphones). Searchlight analyses revealed that the patterns of activation in the right superior temporal sulcus (STS) contained information about who was talking and what phoneme they produced. We take this as evidence that talker information and phonetic information are integrated in the right STS. Functional connectivity analyses suggested that the process of conditioning phonetic identity on talker information depends on the coordinated activity of a left-lateralized phonetic processing system and a right-lateralized talker processing system. Overall, these results clarify the mechanisms through which the right hemisphere supports talker-specific phonetic processing.

     
    more » « less
  2. Identifying neural correlates of conscious perception is a fundamental endeavor of cognitive neuroscience. Most studies so far have focused on visual awareness along with trial-by-trial reports of task relevant stimuli, which can confound neural measures of perceptual awareness with post-perceptual processing. Here, we used a three-phase sine-wave speech paradigm that dissociated between conscious speech perception and task relevance while recording EEG in humans of both sexes. Compared to tokens perceived as noise, physically identical sine-wave speech tokens that were perceived as speech elicited a left-lateralized, near-vertex negativity, which we interpret as a phonological version of a perceptual awareness negativity. This response appeared between 200 and 300 ms after token onset and was not present for frequency-flipped control tokens that were never perceived as speech. In contrast, the P3b elicited by task-irrelevant tokens did not significantly differ when the tokens were perceived as speech versus noise, and was only enhanced for tokens that were both perceived as speechandrelevant to the task. Our results extend the findings from previous studies on visual awareness and speech perception, and suggest that correlates of conscious perception, across types of conscious content, are most likely to be found in mid-latency negative-going brain responses in content-specific sensory areas.

    Significance StatementHow patterns of brain activity give rise to conscious perception is a fundamental question of cognitive neuroscience. Here, we asked whether markers of conscious speech perception can be separated from task-related confounds. We combined sine-wave speech - a degraded speech signal that is heard as noise by naive individuals but can readily be heard as speech after minimal training - with a no-report paradigm that independently manipulated perception (speech versus non-speech) and task (relevant versus irrelevant). Using this paradigm, we were able to identify a marker of speech perception in mid-latency responses over left frontotemporal EEG channels that was independent of task. Our results demonstrate that the “perceptual awareness negativity” is present for a new type of perceptual content (speech).

     
    more » « less
  3. Abstract

    Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions.

     
    more » « less
  4. Humans are born as “universal listeners” without a bias toward any particular language. However, over the first year of life, infants’ perception is shaped by learning native speech categories. Acoustically different sounds—such as the same word produced by different speakers—come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain–behavior relationships related to the generalization of all categories, while in the control group these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.

     
    more » « less
  5. Abstract

    This study examines the putative benefits of explicit phonetic instruction, high variability phonetic training, and their effects on adult nonnative speakers’ Mandarin tone productions. Monolingual first language (L1) English speakers (n= 80), intermediate second language (L2) Mandarin learners (n= 40), and L1 Mandarin speakers (n= 40) took part in a multiday Mandarin‐like artificial language learning task. Participants were asked to repeat a syllable–tone combination immediately after hearing it. Half of all participants were exposed to speech from 1 talker (low variability) while the other half heard speech from 4 talkers (high variability). Half of the L1 English participants were given daily explicit instruction on Mandarin tone contours, while the other half were not. Tone accuracy was measured by L1 Mandarin raters (n= 104) who classified productions according to their perceived tonal category. Explicit instruction of tone contours facilitated L1 English participants’ production of rising and falling tone contours. High variability input alone had no main effect on participants’ productions but interacted with explicit instruction to improve participants’ productions of high‐level tone contours. These results motivate an L2 tone production training approach that consists of explicit tone instruction followed by gradual exposure to more variable speech.

     
    more » « less