skip to main content


Title: Speech features are weighted by selective attention
Listeners typically rely more on one aspect of the speech signal than another when categorizing speech sounds. This is known as feature weighting. We present a rate distortion theory model of feature weighting and use it to ask whether human listeners select feature weights simply by mirroring the feature reliabilities that are present in their input. We show that there is an additional component (selective attention) listeners appear to use that is not reflected by the input statistics. This suggests that an internal mechanism is at play in governing listeners' weighting of different aspects of the speech signal, in addition to tracking statistics.  more » « less
Award ID(s):
2120834
PAR ID:
10414510
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the Conference on Cognitive Computational Neuroscience
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Speech perception is complex and demands constant adaptations to the speaker and the environment (i.e. noisy speech, accent, etc.). To adapt, the listener relies on one speech feature more than another. This cognitive mechanism is called selective attention. We present a model that captures the idea of selective attention: we show that this dynamic adaptation process can be captured in a neural architecture by using a multiple encoder beta variational auto encoder (beta-ME-VAE), which is based on rate distortion theory. This model implements the idea that optimal feature weighting looks different under different listening conditions and provides insight into how listeners can adapt their listening strategy on a moment-to-moment basis, even in listening situations they haven't experienced before. 
    more » « less
  2. Abstract

    Recent studies have documented substantial variability among typical listeners in how gradiently they categorize speech sounds, and this variability in categorization gradience may link to how listeners weight different cues in the incoming signal. The present study tested the relationship between categorization gradience and cue weighting across two sets of English contrasts, each varying orthogonally in two acoustic dimensions. Participants performed a four‐alternative forced‐choice identification task in a visual world paradigm while their eye movements were monitored. We found that (a) greater categorization gradience derived from behavioral identification responses corresponds to larger secondary cue weights derived from eye movements; (b) the relationship between categorization gradience and secondary cue weighting is observed across cues and contrasts, suggesting that categorization gradience may be a consistent within‐individual property in speech perception; and (c) listeners who showed greater categorization gradience tend to adopt a buffered processing strategy, especially when cues arrive asynchronously in time.

     
    more » « less
  3. Abstract

    Communicating with a speaker with a different accent can affect one’s own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In thecanonicalcondition, /b/–/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In thereversecondition, the F0xVOT relationship reversed to create an “accent” with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners’ own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners’ own speech productions.

     
    more » « less
  4. Abstract

    For nearly 25 years, researchers have recognized the rich and numerous facets of native perception of non‐native speech, driving a large, and growing, body of work that has shed light on how native listeners understand non‐native speech. The bulk of this work, however, has focused on the talker. That is, most researchers have asked what perception of non‐native speech tells us about the non‐native speaker, or when interacting with non‐native speakers more generally. It is clear that listeners perceive speech not only in terms of the acoustic signal, but also with their own experience and biases driving their perception. It is also clear that native listeners can improve their perception of non‐native speech for both familiar and unfamiliar accents. Therefore, it is imperative that research in non‐native communication also consider an active role for the listener. To truly understand communication between native and non‐native speakers, it is critically important to understand both the properties of non‐native speech and how this speech is perceived. In the present review, we describe non‐native speech and then review previous research, examining the methodological shift from using native listeners as tools to understand properties of non‐native speech to understanding listeners as partners in conversation. We discuss how current models not only limit our understanding of non‐native speech, but also limit what types of questions researchers set out to answer. We demonstrate that while non‐native speakers capable of shifting their productions to be better understood by listeners, native listeners are also capable of shifting their perception to more accurately perceive non‐native speech. We conclude by setting forth a series of recommendations for future research, emphasizing the contributions of native listeners and non‐native speakers as equally important for communicative success.

     
    more » « less
  5. Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance. 
    more » « less