skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 27, 2026

Title: Talker-specificity beyond the lexicon: Recognition memory for spoken sentences
Over the past 35 years, it has been established that mental representations of language include fine-grained acoustic details stored in episodic memory. The empirical foundations of this fact were established through a series of word recognition experiments showing that participants were better at remembering words repeated by the same talker than words repeated by a different talker (talker-specificity effect). This effect has been widely replicated, but exclusively with isolated, generally monosyllabic, words as the object of study. Whether fine-grained acoustic detail plays a role in the encoding and retrieval of larger structures, such as spoken sentences, has important implications for theories of language understanding in natural communicative contexts. In this study, we extended traditional recognition memory methods to use full spoken sentences rather than individual words as stimuli. Additionally, we manipulated attention at the time of encoding in order to probe the automaticity of fine-grained acoustic encoding. Participants were more accurate for sentences repeated by the same talker than by a different talker. They were also faster and more accurate in the Full Attention than in the Divided Attention condition. The specificity effect was more pronounced for the Divided Attention than the Full Attention group. These findings provide evidence for specificity at the sentence level. They also highlight the implicit, automatic encoding of fine-grained acoustic detail and point to a central role for cognitive resource allocation in shaping memory-based language representations.  more » « less
Award ID(s):
2314753
PAR ID:
10645872
Author(s) / Creator(s):
;
Publisher / Repository:
The Psychonomic Society
Date Published:
Journal Name:
Psychonomic Bulletin & Review
ISSN:
1069-9384
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. It is now well established that memory representations of words are acoustically rich. Alongside this development, a related line of work has shown that the robustness of memory encoding varies widely depending on who is speaking. In this dissertation, I explore the cognitive basis of memory asymmetries at a larger linguistic level (spoken sentences), using the mechanism of socially guided attention allocation to explain how listeners dynamically shift cognitive resources based on the social characteristics of speech. This dissertation consists of three empirical studies designed to investigate the factors that pattern asymmetric memory for spoken language. In the first study, I explored specificity effects at the level of the sentence. While previous research on specificity has centralized the lexical item as the unit of study, I showed that talker-specific memory patterns are also robust at a larger linguistic level, making it likely that acoustic detail is fundamental to human speech perception more broadly. In the second study, I introduced a set of diverse talkers and showed that memory patterns vary widely within this group, and that the memorability of individual talkers is somewhat consistent across listeners. In the third study, I showed that memory behaviors do not depend merely on the speech characteristics of the talker or on the content of the sentence, but on the unique relationship between these two. Memory dramatically improved when semantic content of sentences was congruent with widely held social associations with talkers based on their speech, and this effect was particularly pronounced when listeners had a high cognitive load during encoding. These data collectively provide evidence that listeners allocate attentional resources on an ad hoc, socially guided basis. Listeners subconsciously draw on fine-grained phonetic information and social associations to dynamically adapt low-level cognitive processes while understanding spoken language and encoding it to memory. This approach positions variation in speech not as an obstacle to perception, but as an information source that humans readily recruit to aid in the seamless understanding of spoken language. 
    more » « less
  2. Abstract Objective: Acoustic distortions to the speech signal impair spoken language recognition, but healthy listeners exhibit adaptive plasticity consistent with rapid adjustments in how the distorted speech input maps to speech representations, perhaps through engagement of supervised error-driven learning. This puts adaptive plasticity in speech perception in an interesting position with regard to developmental dyslexia inasmuch as dyslexia impacts speech processing and may involve dysfunction in neurobiological systems hypothesized to be involved in adaptive plasticity. Method: Here, we examined typical young adult listeners ( N = 17), and those with dyslexia ( N = 16), as they reported the identity of native-language monosyllabic spoken words to which signal processing had been applied to create a systematic acoustic distortion. During training, all participants experienced incremental signal distortion increases to mildly distorted speech along with orthographic and auditory feedback indicating word identity following response across a brief, 250-trial training block. During pretest and posttest phases, no feedback was provided to participants. Results: Word recognition across severely distorted speech was poor at pretest and equivalent across groups. Training led to improved word recognition for the most severely distorted speech at posttest, with evidence that adaptive plasticity generalized to support recognition of new tokens not previously experienced under distortion. However, training-related recognition gains for listeners with dyslexia were significantly less robust than for control listeners. Conclusions: Less efficient adaptive plasticity to speech distortions may impact the ability of individuals with dyslexia to deal with variability arising from sources like acoustic noise and foreign-accented speech. 
    more » « less
  3. Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. 
    more » « less
  4. Learning to process speech in a foreign language involves learning new representations for mapping the auditory signal to linguistic structure. Behavioral experiments suggest that even listeners that are highly proficient in a non-native language experience interference from representations of their native language. However, much of the evidence for such interference comes from tasks that may inadvertently increase the salience of native language competitors. Here we tested for neural evidence of proficiency and native language interference in a naturalistic story listening task. We studied electroencephalography responses of 39 native speakers of Dutch (14 male) to an English short story, spoken by a native speaker of either American English or Dutch. We modeled brain responses with multivariate temporal response functions, using acoustic and language models. We found evidence for activation of Dutch language statistics when listening to English, but only when it was spoken with a Dutch accent. This suggests that a naturalistic, monolingual setting decreases the interference from native language representations, whereas an accent in the listener's own native language may increase native language interference, by increasing the salience of the native language and activating native language phonetic and lexical representations. Brain responses suggest that such interference stems from words from the native language competing with the foreign language in a single word recognition system, rather than being activated in a parallel lexicon. We further found that secondary acoustic representations of speech (after 200 ms latency) decreased with increasing proficiency. This may reflect improved acoustic–phonetic models in more proficient listeners. Significance StatementBehavioral experiments suggest that native language knowledge interferes with foreign language listening, but such effects may be sensitive to task manipulations, as tasks that increase metalinguistic awareness may also increase native language interference. This highlights the need for studying non-native speech processing using naturalistic tasks. We measured neural responses unobtrusively while participants listened for comprehension and characterized the influence of proficiency at multiple levels of representation. We found that salience of the native language, as manipulated through speaker accent, affected activation of native language representations: significant evidence for activation of native language (Dutch) categories was only obtained when the speaker had a Dutch accent, whereas no significant interference was found to a speaker with a native (American) accent. 
    more » « less
  5. null (Ed.)
    Successful listening in a second language (L2) involves learning to identify the relevant acoustic–phonetic dimensions that differentiate between words in the L2, and then use these cues to access lexical representations during real-time comprehension. This is a particularly challenging goal to achieve when the relevant acoustic–phonetic dimensions in the L2 differ from those in the L1, as is the case for the L2 acquisition of Mandarin, a tonal language, by speakers of non-tonal languages like English. Previous work shows tone in L2 is perceived less categorically (Shen and Froud, 2019) and weighted less in word recognition (Pelzl et al., 2019) than in L1. However, little is known about the link between categorical perception of tone and use of tone in real time L2 word recognition at the level of the individual learner. This study presents evidence from 30 native and 29 L1-English speakers of Mandarin who completed a real-time spoken word recognition and a tone identification task. Results show that L2 learners differed from native speakers in both the extent to which they perceived tone categorically as well as in their ability to use tonal cues to distinguish between words in real-time comprehension. Critically, learners who reliably distinguished between words differing by tone alone in the word recognition task also showed more categorical perception of tone on the identification task. Moreover, within this group, performance on the two tasks was strongly correlated. This provides the first direct evidence showing that the ability to perceive tone categorically is related to the weighting of tonal cues during spoken word recognition, thus contributing to a better understanding of the link between phonemic and lexical processing, which has been argued to be a key component in the L2 acquisition of tone (Wong and Perrachione, 2007). 
    more » « less