skip to main content

Title: Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Speech enhancement techniques that use a generative adversarial network (GAN) can effectively suppress noise while allowing models to be trained end-to-end. However, such techniques directly operate on time-domain waveforms, which are often highly-dimensional and require extensive computation. This paper proposes a novel GAN-based speech enhancement method, referred to as S-ForkGAN, that operates on log-power spectra rather than on time-domain speech waveforms, and uses a forked GAN structure to extract both speech and noise information. By operating on log-power spectra, one can seamlessly include conventional spectral subtraction techniques, and the parameter space typically has a lower dimension. The performance of S-ForkGAN is assessed for automatic speech recognition (ASR) using the TIMIT data set and a wide range of noise conditions. It is shown that S-ForkGAN outperforms existing GAN-based techniques and that it has a lower complexity.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of Interspeech 2019
Page Range / eLocation ID:
3163 to 3167
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Speech enhancement is an essential component in robust automatic speech recognition (ASR) systems. Most speech enhancement methods are nowadays based on neural networks that use feature-mapping or mask-learning. This paper proposes a novel speech enhancement method that integrates time-domain feature mapping and mask learning into a unified framework using a Generative Adversarial Network (GAN). The proposed framework processes the received waveform and decouples speech and noise signals, which are fed into two short-time Fourier transform (STFT) convolution 1-D layers that map the waveforms to spectrograms in the complex domain. These speech and noise spectrograms are then used to compute the speech mask loss. The proposed method is evaluated using the TIMIT data set for seen and unseen signal-to-noise ratio conditions. It is shown that the proposed method outperforms the speech enhancement methods that use Deep Neural Network (DNN) based speech enhancement or a Speech Enhancement Generative Adversarial Network (SEGAN). 
    more » « less
  2. Abstract

    On-chip spectrometers have the potential to offer dramatic size, weight, and power advantages over conventional benchtop instruments for many applications such as spectroscopic sensing, optical network performance monitoring, hyperspectral imaging, and radio-frequency spectrum analysis. Existing on-chip spectrometer designs, however, are limited in spectral channel count and signal-to-noise ratio. Here we demonstrate a transformative on-chip digital Fourier transform spectrometer that acquires high-resolution spectra via time-domain modulation of a reconfigurable Mach-Zehnder interferometer. The device, fabricated and packaged using industry-standard silicon photonics technology, claims the multiplex advantage to dramatically boost the signal-to-noise ratio and unprecedented scalability capable of addressing exponentially increasing numbers of spectral channels. We further explore and implement machine learning regularization techniques to spectrum reconstruction. Using an ‘elastic-D1’ regularized regression method that we develop, we achieved significant noise suppression for both broad (>600 GHz) and narrow (<25 GHz) spectral features, as well as spectral resolution enhancement beyond the classical Rayleigh criterion.

    more » « less
  3. Abstract

    We explore the potential of the adjoint‐state tsunami inversion method for rapid and accurate near‐field tsunami source characterization using S‐net, an array of ocean bottom pressure gauges. Compared to earthquake‐based methods, this method can obtain more accurate predictions for the initial water elevation of the tsunami source, including potential secondary sources, leading to accurate water height and wave run‐up predictions. Unlike finite‐fault tsunami source inversions, the adjoint method achieves high‐resolution results without requiring densely gridded Green's functions, reducing computation time. However, optimal results require a dense instrument network with sufficient azimuthal coverage. S‐net meets these requirements and reduces data collection time, facilitating the inversion and timely issuance of tsunami warnings. Since the method has not yet been applied to dense, near‐field data, we test it on synthetic waveforms of the 2011Mw9.0 Tohoku earthquake and tsunami, including triggered secondary sources. The results indicate that with a static source model without noise, using the first 5 min of the waveforms yields a favorable performance with an average accuracy score of 93%, and the largest error of predicted wave amplitudes ranges between −5.6 and 1.9 m. Using the first 20 min, secondary sources were clearly resolved. We also demonstrate the method's applicability using S‐net recordings of the 2016Mw6.9 Fukushima earthquake. The findings suggest that lower‐magnitude events require a longer waveform duration for accurate adjoint inversion. Moreover, the estimated stress drop obtained from inverting our obtained tsunami source, assuming uniform slip, aligns with estimations from recent studies.

    more » « less

    We applied nonlinear thresholding and scale–time gating in the continuous wavelet transform (CWT) domain to denoise, identify and characterize seismic phases contained in gradiometer and phased array waveforms of four seismic events recorded during the 2016 Incorporated Research Institutions of Seismology Wavefields Experiment in northern Oklahoma. A dense, 80-element three component phased array was subset from the linear array deployments to examine background noise, waveform coherence and seismic wave composition for local explosion and earthquake waveforms. CWT techniques were also used to significantly improve gradiometery analyses for data recorded by the geodetic array subexperiment. We observed as much as two orders of magnitude gain in the data signal-to-noise ratio. We also saw improvement in array beam quality after denoising the seismic data. Using the signal partitioning technique, we were able to extract and identify many phases based on their positions on the scale–time plane. CWT denoising and wavefield decomposition techniques also improved gradiometry analysis results from the 112-element geodetic array (also called the gradiometer) since waves could be separated before the computation of wave attributes. The operations of removing noise and gating out signal phases improved signal coherence across array records and provided clear P wave onsets on horizontal records, which can mitigate phase picking error and resulting event location uncertainty.

    more » « less
  5. Abstract

    Far-ultraviolet (FUV; ∼1200–2000 Å) spectra are fundamental to our understanding of star-forming galaxies, providing a unique window on massive stellar populations, chemical evolution, feedback processes, and reionization. The launch of the James Webb Space Telescope will soon usher in a new era, pushing the UV spectroscopic frontier to higher redshifts than ever before; however, its success hinges on a comprehensive understanding of the massive star populations and gas conditions that power the observed UV spectral features. This requires a level of detail that is only possible with a combination of ample wavelength coverage, signal-to-noise, spectral-resolution, and sample diversity that has not yet been achieved by any FUV spectral database. We present the Cosmic Origins Spectrograph Legacy Spectroscopic Survey (CLASSY) treasury and its first high-level science product, the CLASSY atlas. CLASSY builds on the Hubble Space Telescope (HST) archive to construct the first high-quality (S/N1500 Å≳ 5/resel), high-resolution (R∼ 15,000) FUV spectral database of 45 nearby (0.002 <z< 0.182) star-forming galaxies. The CLASSY atlas, available to the public via the CLASSY website, is the result of optimally extracting and coadding 170 archival+new spectra from 312 orbits of HST observations. The CLASSY sample covers a broad range of properties including stellar mass (6.2 < logM(M) < 10.1), star formation rate (−2.0 < log SFR (Myr−1) < +1.6), direct gas-phase metallicity (7.0 < 12+log(O/H) < 8.8), ionization (0.5 < O32< 38.0), reddening (0.02 <E(BV) < 0.67), and nebular density (10 <ne(cm−3) < 1120). CLASSY is biased to UV-bright star-forming galaxies, resulting in a sample that is consistent with thez∼ 0 mass–metallicity relationship, but is offset to higher star formation rates by roughly 2 dex, similar toz≳ 2 galaxies. This unique set of properties makes the CLASSY atlas the benchmark training set for star-forming galaxies across cosmic time.

    more » « less