skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on September 1, 2025

Title: DNN-based monaural speech enhancement using alternate analysis windows for phase and magnitude modification
In recent decades, considerable research has been devoted to speech enhancement leveraging the short-term Fourier transform (STFT) analysis. As speech processing technology evolves, the significance of phase information in enhancing speech intelligibility becomes more noticeable. Typically, the Hanning window has been widely employed as analysis window in STFT. In this study, we propose the Chebyshev window for phase analysis, and the Hanning window for magnitude analysis. Next, we introduce a novel cepstral domain enhancement approach designed to robustly reinforce the harmonic structure of speech. The performance of our model is evaluated using the DNS challenge test set as well as the naturalistic APOLLO Fearless Steps evaluation set. Experimental results demonstrate that the Chebyshev-based phase solution outperforms the Hanning option for in phase-aware speech enhancement. Furthermore, the incorporation of quefrency emphasis proves effective in enhancing overall speech quality.  more » « less
Award ID(s):
2016725
PAR ID:
10542791
Author(s) / Creator(s):
;
Publisher / Repository:
ISCA
Date Published:
Edition / Version:
1
Volume:
1
Issue:
2244
Page Range / eLocation ID:
1705 to 1709
Subject(s) / Keyword(s):
Speech Enhancement DNN Phase and Magniture Chebyshev
Format(s):
Medium: X Size: 8MB
Size(s):
8MB
Location:
Kos Island, Greece
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Speech enhancement is an essential component in robust automatic speech recognition (ASR) systems. Most speech enhancement methods are nowadays based on neural networks that use feature-mapping or mask-learning. This paper proposes a novel speech enhancement method that integrates time-domain feature mapping and mask learning into a unified framework using a Generative Adversarial Network (GAN). The proposed framework processes the received waveform and decouples speech and noise signals, which are fed into two short-time Fourier transform (STFT) convolution 1-D layers that map the waveforms to spectrograms in the complex domain. These speech and noise spectrograms are then used to compute the speech mask loss. The proposed method is evaluated using the TIMIT data set for seen and unseen signal-to-noise ratio conditions. It is shown that the proposed method outperforms the speech enhancement methods that use Deep Neural Network (DNN) based speech enhancement or a Speech Enhancement Generative Adversarial Network (SEGAN). 
    more » « less
  2. null (Ed.)
    Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity. 
    more » « less
  3. In this paper, we consider the inverse graph filtering process when the original filter is a polynomial of some graph shift on a simple connected graph. The Chebyshev polynomial approximation of high order has been widely used to approximate the inverse filter. In this paper, we propose an iterative Chebyshev polynomial approximation (ICPA) algorithm to implement the inverse filtering procedure, which is feasible to eliminate the restoration error even using Chebyshev polynomial approximation of lower order. We also provide a detailed convergence analysis for the ICPA algorithm and a distributed implementation of the ICPA algorithm on a spatially distributed network. Numerical results are included to demonstrate the satisfactory performance of the ICPA algorithm in graph signal denoising. 
    more » « less
  4. Bizley, Jennifer K. (Ed.)
    Hearing one’s own voice is critical for fluent speech production as it allows for the detection and correction of vocalization errors in real time. This behavior known as the auditory feedback control of speech is impaired in various neurological disorders ranging from stuttering to aphasia; however, the underlying neural mechanisms are still poorly understood. Computational models of speech motor control suggest that, during speech production, the brain uses an efference copy of the motor command to generate an internal estimate of the speech output. When actual feedback differs from this internal estimate, an error signal is generated to correct the internal estimate and update necessary motor commands to produce intended speech. We were able to localize the auditory error signal using electrocorticographic recordings from neurosurgical participants during a delayed auditory feedback (DAF) paradigm. In this task, participants hear their voice with a time delay as they produced words and sentences (similar to an echo on a conference call), which is well known to disrupt fluency by causing slow and stutter-like speech in humans. We observed a significant response enhancement in auditory cortex that scaled with the duration of feedback delay, indicating an auditory speech error signal. Immediately following auditory cortex, dorsal precentral gyrus (dPreCG), a region that has not been implicated in auditory feedback processing before, exhibited a markedly similar response enhancement, suggesting a tight coupling between the 2 regions. Critically, response enhancement in dPreCG occurred only during articulation of long utterances due to a continuous mismatch between produced speech and reafferent feedback. These results suggest that dPreCG plays an essential role in processing auditory error signals during speech production to maintain fluency. 
    more » « less
  5. In two-phase materials, each phase having a non-local response in time, it has been found that for some driving fields the response somehow untangles at specific times, and allows one to directly infer useful information about the geometry of the material, such as the volume fractions of the phases. Motivated by this, and to obtain an algorithm for designing appropriate driving fields, we find approximate, measure independent, linear relations between the values that Markov functions take at a given set of possibly complex points, not belonging to the interval [-1,1] where the measure is supported. The problem is reduced to simply one of polynomial approximation of a given function on the interval [-1,1] and, to simplify the analysis, Chebyshev approximation is used. This allows one to obtain explicit estimates of the error of the approximation, in terms of the number of points and the minimum distance of the points to the interval [-1,1]. Assuming this minimum distance is bounded below by a number greater than 1/2, the error converges exponentially to zero as the number of points is increased. Approximate linear relations are also obtained that incorporate a set of moments of the measure. In the context of the motivating problem, the analysis also yields bounds on the response at any particular time for any driving field, and allows one to estimate the response at a given frequency using an appropriately designed driving field that effectively is turned on only for a fixed interval of time. The approximation extends directly to Markov-type functions with a positive semidefinite operator valued measure, and this has applications to determining the shape of an inclusion in a body from boundary flux measurements at a specific time, when the time-dependent boundary potentials are suitably tailored. 
    more » « less