skip to main content

Title: Role of the striatum in incidental learning of sound categories

Humans are born as “universal listeners” without a bias toward any particular language. However, over the first year of life, infants’ perception is shaped by learning native speech categories. Acoustically different sounds—such as the same word produced by different speakers—come to be treated as functionally equivalent. In natural environments, these categories often emerge incidentally without overt categorization or explicit feedback. However, the neural substrates of category learning have been investigated almost exclusively using overt categorization tasks with explicit feedback about categorization decisions. Here, we examined whether the striatum, previously implicated in category learning, contributes to incidental acquisition of sound categories. In the fMRI scanner, participants played a videogame in which sound category exemplars aligned with game actions and events, allowing sound categories to incidentally support successful game play. An experimental group heard nonspeech sound exemplars drawn from coherent category spaces, whereas a control group heard acoustically similar sounds drawn from a less structured space. Although the groups exhibited similar in-game performance, generalization of sound category learning and activation of the posterior striatum were significantly greater in the experimental than control group. Moreover, the experimental group showed brain–behavior relationships related to the generalization of all categories, while in the control group more » these relationships were restricted to the categories with structured sound distributions. Together, these results demonstrate that the striatum, through its interactions with the left superior temporal sulcus, contributes to incidental acquisition of sound category representations emerging from naturalistic learning environments.

« less
; ;
Publication Date:
Journal Name:
Proceedings of the National Academy of Sciences
Page Range or eLocation-ID:
p. 4671-4680
Proceedings of the National Academy of Sciences
Sponsoring Org:
National Science Foundation
More Like this
  1. Category learning is fundamental to cognition, but little is known about how it proceeds in real-world environments when learners do not have instructions to search for category-relevant information, do not make overt category decisions, and do not experience direct feedback. Prior research demonstrates that listeners can acquire task-irrelevant auditory categories incidentally as they engage in primarily visuomotor tasks. The current study examines the factors that support this incidental category learning. Three experiments systematically manipulated the relationship of four novel auditory categories with a consistent visual feature (color or location) that informed a simple behavioral keypress response regarding the visual feature.more »In both an in-person experiment and two online replications with extensions, incidental auditory category learning occurred reliably when category exemplars consistently aligned with visuomotor demands of the primary task, but not when they were misaligned. The presence of an additional irrelevant visual feature that was uncorrelated with the primary task demands neither enhanced nor harmed incidental learning. By contrast, incidental learning did not occur when auditory categories were aligned consistently with one visual feature, but the motor response in the primary task was aligned with another, category-unaligned visual feature. Moreover, category learning did not reliably occur across passive observation or when participants made a category-nonspecific, generic motor response. These findings show that incidental learning of categories is strongly mediated by the character of coincident behavior.« less
  2. The extent that articulatory information embedded in incoming speech contributes to the formation of new perceptual categories for speech sounds has been a matter of discourse for decades. It has been theorized that the acquisition of new speech sound categories requires a network of sensory and speech motor cortical areas (the “dorsal stream”) to successfully integrate auditory and articulatory information. However, it is possible that these brain regions are not sensitive specifically to articulatory information, but instead are sensitive to the abstract phonological categories being learned. We tested this hypothesis by training participants over the course of several days onmore »an articulable non-native speech contrast and acoustically matched inarticulable nonspeech analogues. After reaching comparable levels of proficiency with the two sets of stimuli, activation was measured in fMRI as participants passively listened to both sound types. Decoding of category membership for the articulable speech contrast alone revealed a series of left and right hemisphere regions outside of the dorsal stream that have previously been implicated in the emergence of non-native speech sound categories, while no regions could successfully decode the inarticulable nonspeech contrast. Although activation patterns in the left inferior frontal gyrus (IFG), the middle temporal gyrus (MTG), and the supplementary motor area (SMA) provided better information for decoding articulable (speech) sounds compared to the inarticulable (sine wave) sounds, the finding that dorsal stream regions do not emerge as good decoders of the articulable contrast alone suggests that other factors, including the strength and structure of the emerging speech categories are more likely drivers of dorsal stream activation for novel sound learning.« less
  3. Infants learn the sound categories of their language and adults successfully process the sounds they hear, even though sound categories often overlap in their acoustics. Most researchers agree that listeners use context to disambiguate overlapping categories. However, they differ in their ideas about how context is used. One idea is that listeners normalize out the systematic effects of context from the acoustics of a sound. Another idea is that contextual information may itself be an informative cue to category membership, due to patterns in the types of contexts that particular sounds occur in. We directly contrast these two ways ofmore »using context by applying each one to the test case of Japanese vowel length. We find that normalizing out contextual variability from the acoustics does not improve categorization, but using context in a top-down fashion does so substantially. This reveals a limitation of normalization in phonetic acquisition and processing and suggests that approaches that make use of top-down contextual information are promising to pursue.« less
  4. A wealth of evidence indicates the existence of a consolidation phase, triggered by and following a practice session, wherein new memory traces relevant to task performance are transformed and honed to represent new knowledge. But, the role of consolidation is not well-understood in category learning and has not been studied at all under incidental category learning conditions. Here, we examined the acquisition, consolidation and retention phases in a visuomotor task wherein auditory category information was available, but not required, to guide detection of an above-threshold visual target across one of four spatial locations. We compared two training conditions: (1) Constant,more »whereby repeated instances of one exemplar from an auditory category preceded a visual target, predicting its upcoming location; (2) Variable, whereby five distinct category exemplars predicted the visual target. Visual detection speed and accuracy, as well as the performance cost of randomizing the association of auditory category to visual target location, were assessed during online performance, again after a 24-hour delay to assess the expression of delayed gains, and after 10 days to assess retention. Results revealed delayed gains associated with incidental auditory category learning and retention effects for both training conditions. Offline processes can be triggered even for incidental auditory input and lead to category learning; variability of input can enhance the generation of incidental auditory category learning.« less
  5. Abstract A listener's interpretation of a given speech sound can vary probabilistically from moment to moment. Previous experience (i.e., the contexts in which one has encountered an ambiguous sound) can further influence the interpretation of speech, a phenomenon known as perceptual learning for speech. This study used multivoxel pattern analysis to query how neural patterns reflect perceptual learning, leveraging archival fMRI data from a lexically guided perceptual learning study conducted by Myers and Mesite [Myers, E. B., & Mesite, L. M. Neural systems underlying perceptual adjustment to non-standard speech tokens. Journal of Memory and Language, 76, 80–93, 2014]. In thatmore »study, participants first heard ambiguous /s/–/∫/ blends in either /s/-biased lexical contexts (epi_ode) or /∫/-biased contexts (refre_ing); subsequently, they performed a phonetic categorization task on tokens from an /asi/–/a∫i/ continuum. In the current work, a classifier was trained to distinguish between phonetic categorization trials in which participants heard unambiguous productions of /s/ and those in which they heard unambiguous productions of /∫/. The classifier was able to generalize this training to ambiguous tokens from the middle of the continuum on the basis of individual participants' trial-by-trial perception. We take these findings as evidence that perceptual learning for speech involves neural recalibration, such that the pattern of activation approximates the perceived category. Exploratory analyses showed that left parietal regions (supramarginal and angular gyri) and right temporal regions (superior, middle, and transverse temporal gyri) were most informative for categorization. Overall, our results inform an understanding of how moment-to-moment variability in speech perception is encoded in the brain.« less