Despite remarkable improvements, automatic speech recognition is susceptible to adversarial perturbations. Compared to standard machine learning architectures, these attacks are significantly more challenging, especially since the inputs to a speech recognition system are time series that contain both acoustic and linguistic properties of speech. Extracting all recognition-relevant information requires more complex pipelines and an ensemble of specialized components. Consequently, an attacker needs to consider the entire pipeline.
In this paper, we present VENOMAVE, the first training- time poisoning attack against speech recognition. Similar to the predominantly studied evasion attacks, we pursue the same goal: leading the system to an incorrect and attacker-chosen transcription of a target audio waveform. In contrast to evasion attacks, however, we assume that the attacker can only manipulate a small part of the training data without altering the target audio waveform at runtime. We evaluate our attack on two datasets: TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset, VENOMAVE achieves attack success rates of more than 80.0%, without access to the victim’s network architecture or hyperparameters. In a more realistic scenario, when the target audio waveform is played over the air in different rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE achieves an attack transferability rate of 36.4% between two different model architectures.
more »
« less
Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems
This paper presents Metamorph, a system that generates imperceptible audio that can survive over-the-air trans- mission to attack the neural network of a speech recognition system. The key challenge stems from how to ensure the added perturbation of the original audio in advance at the sender side is immune to unknown signal distortions during the transmission process. Our empirical study reveals that signal distortion is mainly due to device and channel frequency selectivity but with different characteristics. This brings a chance to capture and further pre-code this impact to generate adversarial examples that are robust to the over-the-air transmission. We leverage this opportunity in Metamorph and obtain an initial perturbation that captures the core distortion’s impact from only a small set of prior measurements, and then take advantage of a domain adaptation algorithm to refine the perturbation to further im- prove the attack distance and reliability. Moreover, we consider also reducing human perceptibility of the added perturbation. Evaluation achieves a high attack success rate (90%) over the attack distance of up to 6 m. Within a moderate distance, e.g., 3 m, Metamorph maintains this high success rate, yet can be further adapted to largely improve the audio quality, confirmed by a human perceptibility study.
more »
« less
- Award ID(s):
- 1617161
- NSF-PAR ID:
- 10212193
- Date Published:
- Journal Name:
- Network and Distributed Systems Security (NDSS) Symposium
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The secure functioning of automotive systems is vital to the safety of their passengers and other roadway users. One of the critical functions for safety is the controller area network (CAN), which interconnects the safety-critical electronic control units (ECUs) in the majority of ground vehicles. Unfortunately CAN is known to be vulnerable to several attacks. One such attack is the bus-off attack, which can be used to cause a victim ECU to disconnect itself from the CAN bus and, subsequently, for an attacker to masquerade as that ECU. A limitation of the bus-off attack is that it requires the attacker to achieve tight synchronization between the transmission of the victim and the attacker’s injected message. In this paper, we introduce a schedule-based attack framework for the CAN bus-off attack that uses the real-time schedule of the CAN bus to predict more attack opportunities than previously known. We describe a ranking method for an attacker to select and optimize its attack injections with respect to criteria such as attack success rate, bus perturbation, or attack latency. The results show that vulnerabilities of the CAN bus can be enhanced by schedulebased attacks.more » « less
-
The state-of-the-art performance of deep learning algorithms has led to a considerable increase in the utilization of machine learning in security-sensitive and critical applications. However, it has recently been shown that a small and carefully crafted perturbation in the input space can completely fool a deep model. In this study, we explore the extent to which face recognition systems are vulnerable to geometrically-perturbed adversarial faces. We propose a fast landmark manipulation method for generating adversarial faces, which is approximately 200 times faster than the previous geometric attacks and obtains 99.86% success rate on the state-of-the-art face recognition models. To further force the generated samples to be natural, we introduce a second attack constrained on the semantic structure of the face which has the half speed of the first attack with the success rate of 99.96%. Both attacks are extremely robust against the state-of-the-art defense methods with the success rate of equal or greater than 53.59%. Code is available at https://github.com/alldbi/FLM.more » « less
-
Automatic Speech Recognition (ASR) systems convert speech into text and can be placed into two broad categories: traditional and fully end-to-end. Both types have been shown to be vulnerable to adversarial audio examples that sound benign to the human ear but force the ASR to produce malicious transcriptions. Of these attacks, only the "psychoacoustic" attacks can create examples with relatively imperceptible perturbations, as they leverage the knowledge of the human auditory system. Unfortunately, existing psychoacoustic attacks can only be applied against traditional models, and are obsolete against the newer, fully end-to-end ASRs. In this paper, we propose an equalization-based psychoacoustic attack that can exploit both traditional and fully end-to-end ASRs. We successfully demonstrate our attack against real-world ASRs that include DeepSpeech and Wav2Letter. Moreover, we employ a user study to verify that our method creates low audible distortion. Specifically, 80 of the 100 participants voted in favor of all our attack audio samples as less noisier than the existing state-of-the-art attack. Through this, we demonstrate both types of existing ASR pipelines can be exploited with minimum degradation to attack audio quality.more » « less
-
Adversarial machine learning research has recently demonstrated the feasibility to confuse automatic speech recognition (ASR) models by introducing acoustically imperceptible perturbations to audio samples. To help researchers and practitioners gain better understanding of the impact of such attacks, and to provide them with tools to help them more easily evaluate and craft strong defenses for their models, we present Adagio, the first tool designed to allow interactive experimentation with adversarial attacks and defenses on an ASR model in real time, both visually and aurally. Adagio incorporates AMR and MP3 audio compression techniques as defenses, which users can interactively apply to attacked audio samples. We show that these techniques, which are based on psychoacoustic principles, effectively eliminate targeted attacks, reducing the attack success rate from 92.5% to 0%. We will demonstrate Adagio and invite the audience to try it on the Mozilla Common Voice dataset. Code related to this paper is available at: https://github.com/nilakshdas/ADAGIO.more » « less