Speech conveys both linguistic messages and a wealth of social and identity information about a talker. This information arrives as complex variations across many acoustic dimensions. Ultimately, speech communication depends on experience within a language community to develop shared long-term knowledge of the mapping from acoustic patterns to the category distinctions that support word recognition, emotion evaluation, and talker identification. A great deal of research has focused on the learning involved in acquiring long-term knowledge to support speech categorization. Inadvertently, this focus may give the impression of a mature learning endpoint. Instead, there seems to be no firm line between perception and learning in speech. The contributions of acoustic dimensions are malleably reweighted continuously as a function of regularities evolving in short-term input. In this way, continuous learning across speech impacts the very nature of the mapping from sensory input to perceived category. This article presents a case study in understanding how incoming sensory input—and the learning that takes place across it—interacts with existing knowledge to drive predictions that tune the system to support future behavior.
more »
« less
Using lexical stress, speech rate, rhythm, and pauses to characterize and normalize second language speech intelligibility
- Award ID(s):
- 2140469
- PAR ID:
- 10531499
- Publisher / Repository:
- Acoustic Society of America
- Date Published:
- Page Range / eLocation ID:
- 060008
- Format(s):
- Medium: X
- Location:
- Nashville, Tennessee
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Decoding human speech from neural signals is essential for brain–computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech, data complexity and high dimensionality. Here we present a novel deep learning-based neural speech decoding framework that includes an ECoG decoder that translates electrocorticographic (ECoG) signals from the cortex into interpretable speech parameters and a novel differentiable speech synthesizer that maps speech parameters to spectrograms. We have developed a companion speech-to-speech auto-encoder consisting of a speech encoder and the same speech synthesizer to generate reference speech parameters to facilitate the ECoG decoder training. This framework generates natural-sounding speech and is highly reproducible across a cohort of 48 participants. Our experimental results show that our models can decode speech with high correlation, even when limited to only causal operations, which is necessary for adoption by real-time neural prostheses. Finally, we successfully decode speech in participants with either left or right hemisphere coverage, which could lead to speech prostheses in patients with deficits resulting from left hemisphere damage.more » « less
An official website of the United States government

