skip to main content


Title: Identification of Words and Phrases Through a Phonemic-Based Haptic Display: Effects of Inter-Phoneme and Inter-Word Interval Durations
Stand-alone devices for tactile speech reception serve a need as communication aids for persons with profound sensory impairments as well as in applications such as human-computer interfaces and remote communication when the normal auditory and visual channels are compromised or overloaded. The current research is concerned with perceptual evaluations of a phoneme-based tactile speech communication device in which a unique tactile code was assigned to each of the 24 consonants and 15 vowels of English. The tactile phonemic display was conveyed through an array of 24 tactors that stimulated the dorsal and ventral surfaces of the forearm. Experiments examined the recognition of individual words as a function of the inter-phoneme interval (Study 1) and two-word phrases as a function of the inter-word interval (Study 2). Following an average training period of 4.3 hrs on phoneme and word recognition tasks, mean scores for the recognition of individual words in Study 1 ranged from 87.7% correct to 74.3% correct as the inter-phoneme interval decreased from 300 to 0 ms. In Study 2, following an average of 2.5 hours of training on the two-word phrase task, both words in the phrase were identified with an accuracy of 75% correct using an inter-word interval of 1 sec and an inter-phoneme interval of 150 ms. Effective transmission rates achieved on this task were estimated to be on the order of 30 to 35 words/min.  more » « less
Award ID(s):
1954886 1954842
NSF-PAR ID:
10318477
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Applied Perception
Volume:
18
Issue:
3
ISSN:
1544-3558
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Partial speech input is often understood to trigger rapid and automatic activation of successively higher-level representations of words, from sound to meaning. Here we show evidence from magnetoencephalography that this type of incremental processing is limited when words are heard in isolation as compared to continuous speech. This suggests a less unified and automatic word recognition process than is often assumed. We present evidence from isolated words that neural effects of phoneme probability, quantified by phoneme surprisal, are significantly stronger than (statistically null) effects of phoneme-by-phoneme lexical uncertainty, quantified by cohort entropy. In contrast, we find robust effects of both cohort entropy and phoneme surprisal during perception of connected speech, with a significant interaction between the contexts. This dissociation rules out models of word recognition in which phoneme surprisal and cohort entropy are common indicators of a uniform process, even though these closely related information-theoretic measures both arise from the probability distribution of wordforms consistent with the input. We propose that phoneme surprisal effects reflect automatic access of a lower level of representation of the auditory input (e.g., wordforms) while the occurrence of cohort entropy effects is task sensitive, driven by a competition process or a higher-level representation that is engaged late (or not at all) during the processing of single words.

     
    more » « less
  2. Muresan, Smaranda ; Nakov, Preslav ; Villavicencio, Aline (Ed.)
    Phonemes are defined by their relationship to words: changing a phoneme changes the word. Learning a phoneme inventory with little supervision has been a longstanding challenge with important applications to under-resourced speech technology. In this paper, we bridge the gap between the linguistic and statistical definition of phonemes and propose a novel neural discrete representation learning model for self-supervised learning of phoneme inventory with raw speech and word labels. Under mild assumptions, we prove that the phoneme inventory learned by our approach converges to the true one with an exponentially low error rate. Moreover, in experiments on TIMIT and Mboshi benchmarks, our approach consistently learns a better phoneme-level representation and achieves a lower error rate in a zero-resource phoneme recognition task than previous state-of-the-art self-supervised representation learning algorithms. 
    more » « less
  3. N/A (Ed.)
    This study is focused on understanding and quantifying the change in phoneme and prosody information encoded in the Self-Supervised Learning (SSL) model, brought by an accent identification (AID) fine-tuning task. This problem is addressed based on model probing. Specifically, we conduct a systematic layer-wise analysis of the representations of the Transformer layers on a phoneme correlation task, and a novel word-level prosody prediction task. We compare the probing performance of the pre-trained and fine-tuned SSL models. Results show that the AID fine-tuning task steers the top 2 layers to learn richer phoneme and prosody representation. These changes share some similarities with the effects of fine-tuning with an Automatic Speech Recognition task. In addition, we observe strong accent-specific phoneme representations in layer 9. To sum up, this study provides insights into the understanding of SSL features and their interactions with fine-tuning tasks. 
    more » « less
  4. Abstract

    A longstanding debate has surrounded the role of the motor system in speech perception, but progress in this area has been limited by tasks that only examine isolated syllables and conflate decision-making with perception. Using an adaptive task that temporally isolates perception from decision-making, we examined an EEG signature of motor activity (sensorimotor μ/beta suppression) during the perception of auditory phonemes, auditory words, audiovisual words, and environmental sounds while holding difficulty constant at two levels (Easy/Hard). Results revealed left-lateralized sensorimotor μ/beta suppression that was related to perception of speech but not environmental sounds. Audiovisual word and phoneme stimuli showed enhanced left sensorimotor μ/beta suppression for correct relative to incorrect trials, while auditory word stimuli showed enhanced suppression for incorrect trials. Our results demonstrate that motor involvement in perception is left-lateralized, is specific to speech stimuli, and it not simply the result of domain-general processes. These results provide evidence for an interactive network for speech perception in which dorsal stream motor areas are dynamically engaged during the perception of speech depending on the characteristics of the speech signal. Crucially, this motor engagement has different effects on the perceptual outcome depending on the lexicality and modality of the speech stimulus.

     
    more » « less
  5. Abstract Objective: Acoustic distortions to the speech signal impair spoken language recognition, but healthy listeners exhibit adaptive plasticity consistent with rapid adjustments in how the distorted speech input maps to speech representations, perhaps through engagement of supervised error-driven learning. This puts adaptive plasticity in speech perception in an interesting position with regard to developmental dyslexia inasmuch as dyslexia impacts speech processing and may involve dysfunction in neurobiological systems hypothesized to be involved in adaptive plasticity. Method: Here, we examined typical young adult listeners ( N = 17), and those with dyslexia ( N = 16), as they reported the identity of native-language monosyllabic spoken words to which signal processing had been applied to create a systematic acoustic distortion. During training, all participants experienced incremental signal distortion increases to mildly distorted speech along with orthographic and auditory feedback indicating word identity following response across a brief, 250-trial training block. During pretest and posttest phases, no feedback was provided to participants. Results: Word recognition across severely distorted speech was poor at pretest and equivalent across groups. Training led to improved word recognition for the most severely distorted speech at posttest, with evidence that adaptive plasticity generalized to support recognition of new tokens not previously experienced under distortion. However, training-related recognition gains for listeners with dyslexia were significantly less robust than for control listeners. Conclusions: Less efficient adaptive plasticity to speech distortions may impact the ability of individuals with dyslexia to deal with variability arising from sources like acoustic noise and foreign-accented speech. 
    more » « less