skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HEKWS: Privacy-Preserving Convolutional Neural Network-based Keyword Spotting with a Ciphertext Packing Technique
Keyword spotting (KWS) is a key technology in smart devices. However, privacy issues in these devices have been constantly raised. To solve this problem, this paper applies homomorphic encryption (HE) to a previous small-footprint convolutional neural network (CNN)-based KWS algorithm. This allows for a trustless system in which a command word can be securely identified by a remote cloud server without exposing client data. To alleviate the burden on an edge device of a client, a novel packing technique is proposed that reduces the number of ciphertexts for an input keyword to one. Our HE-based KWS shows a prediction accuracy of 72% for Google's Speech Commands Dataset with 12 labels. This is almost identical to the accuracy of the non-HE-based implementation that has the same CNN layers and approximates a rectified linear unit in the same manner. On a workstation, it takes 19 seconds to process one keyword on average, which can be improved in the future through parallelization, HE parameter optimization, and/or the use of custom hardware accelerators.  more » « less
Award ID(s):
2105373
PAR ID:
10443626
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP)
Page Range / eLocation ID:
01 to 06
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3× energy consumption reduction. 
    more » « less
  2. Fearless Steps (FS) APOLLO is a + 50,000 hr audio resource established by CRSS-UTDallas capturing all communications between NASA-MCC personnel, backroom staff, and Astronauts across manned Apollo Missions. Such a massive audio resource without metadata/unlabeled corpus provides limited benefit for communities outside Speech-and-Language Technology (SLT). Supplementing this audio with rich metadata developed using robust automated mechanisms to transcribe and highlight naturalistic communications can facilitate open research opportunities for SLT, speech sciences, education, and historical archival communities. In this study, we focus on customizing keyword spotting (KWS) and topic detection systems as an initial step towards conversational understanding. Extensive research in automatic speech recognition (ASR), speech activity, and speaker diarization using manually transcribed 125 h FS Challenge corpus has demonstrated the need for robust domain-specific model development. A major challenge in training KWS systems and topic detection models is the availability of word-level annotations. Forced alignment schemes evaluated using state-of-the-art ASR show significant degradation in segmentation performance. This study explores challenges in extracting accurate keyword segments using existing sentence-level transcriptions and proposes domain-specific KWS-based solutions to detect conversational topics in audio streams. 
    more » « less
  3. Homomorphic Encryption (HE) based secure Neural Networks(NNs) inference is one of the most promising security solutions to emerging Machine Learning as a Service (MLaaS). In the HE-based MLaaS setting, a client encrypts the sensitive data, and uploads the encrypted data to the server that directly processes the encrypted data without decryption, and returns the encrypted result to the client. The clients' data privacy is preserved since only the client has the private key. Existing HE-enabled Neural Networks (HENNs), however, suffer from heavy computational overheads. The state-of-the-art HENNs adopt ciphertext packing techniques to reduce homomorphic multiplications by packing multiple messages into one single ciphertext. Nevertheless, rotations are required in these HENNs to implement the sum of the elements within the same ciphertext. We observed that HENNs have to pay significant computing overhead on rotations, and each of rotations is ∼10× more expensive than homomorphic multiplications between ciphertext and plaintext. So the massive rotations have become a primary obstacle of efficient HENNs. In this paper, we propose a fast, frequency-domain deep neural network called Falcon, for fast inferences on encrypted data. Falcon includes a fast Homomorphic Discrete Fourier Transform (HDFT) using block-circulant matrices to homomorphically support spectral operations. We also propose several efficient methods to reduce inference latency, including Homomorphic Spectral Convolution and Homomorphic Spectral Fully Connected operations by combing the batched HE and block-circulant matrices. Our experimental results show Falcon achieves the state-of-the-art inference accuracy and reduces the inference latency by 45.45%∼85.34% over prior HENNs on MNIST and CIFAR-10. 
    more » « less
  4. ABSTRACT High-redshift quasars ionize He ii into He iii around them, heating the intergalactic medium in the process and creating large regions with elevated temperature. In this work, we demonstrate a method based on a convolutional neural network (CNN) to recover the spatial profile for T0, the temperature at the mean cosmic density, in quasar proximity zones. We train the neural network with synthetic spectra drawn from a Cosmic Reionization on Computers simulation. We discover that the simple CNN is able to recover the temperature profile with an accuracy of ≈1400 K in an idealized case of negligible observational uncertainties. We test the robustness of the CNN and discover that it is robust against the uncertainties in quasar host halo mass, quasar continuum, and ionizing flux. We also find that the CNN has good generality with regard to the hardness of quasar spectra. This shows that with noiseless spectra, one could use a simple CNN to distinguish gas inside or outside the He iii region created by the quasar. Because the size of the He iii region is closely related to the total quasar lifetime, this method has great potential in constraining the quasar lifetime on ∼Myr time-scales. However, noise poses a big problem for accuracy and could downgrade the accuracy to ≈2340 K even for very high signal-to-noise (≳50) spectra. Future studies are needed to reduce the error associated with noise to constrain the lifetimes of reionization epoch quasars with currently available data. 
    more » « less
  5. Network quantization is one of the most hardware friendly techniques to enable the deployment of convolutional neural networks (CNNs) on low-power mobile devices. Recent network quantization techniques quantize each weight kernel in a convolutional layer independently for higher inference accuracy, since the weight kernels in a layer exhibit different variances and hence have different amounts of redundancy. The quantization bitwidth or bit number (QBN) directly decides the inference accuracy, latency, energy and hardware overhead. To effectively reduce the redundancy and accelerate CNN inferences, various weight kernels should be quantized with different QBNs. However, prior works use only one QBN to quantize each convolutional layer or the entire CNN, because the design space of searching a QBN for each weight kernel is too large. The hand-crafted heuristic of the kernel-wise QBN search is so sophisticated that domain experts can obtain only sub-optimal results. It is difficult for even deep reinforcement learning (DRL) DDPG-based agents to find a kernel-wise QBN configuration that can achieve reasonable inference accuracy. In this paper, we propose a hierarchical-DRL-based kernel-wise network quantization technique, AutoQ, to automatically search a QBN for each weight kernel, and choose another QBN for each activation layer. Compared to the models quantized by the state-of-the-art DRL-based schemes, on average, the same models quantized by AutoQ reduce the inference latency by 54.06%, and decrease the inference energy consumption by 50.69%, while achieving the same inference accuracy. 
    more » « less