skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Metamorph: Injecting Inaudible Commands into Over-the-air Voice Controlled Systems
This paper presents Metamorph, a system that generates imperceptible audio that can survive over-the-air trans- mission to attack the neural network of a speech recognition system. The key challenge stems from how to ensure the added perturbation of the original audio in advance at the sender side is immune to unknown signal distortions during the transmission process. Our empirical study reveals that signal distortion is mainly due to device and channel frequency selectivity but with different characteristics. This brings a chance to capture and further pre-code this impact to generate adversarial examples that are robust to the over-the-air transmission. We leverage this opportunity in Metamorph and obtain an initial perturbation that captures the core distortion’s impact from only a small set of prior measurements, and then take advantage of a domain adaptation algorithm to refine the perturbation to further im- prove the attack distance and reliability. Moreover, we consider also reducing human perceptibility of the added perturbation. Evaluation achieves a high attack success rate (90%) over the attack distance of up to 6 m. Within a moderate distance, e.g., 3 m, Metamorph maintains this high success rate, yet can be further adapted to largely improve the audio quality, confirmed by a human perceptibility study.  more » « less
Award ID(s):
1617161
PAR ID:
10212193
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Network and Distributed Systems Security (NDSS) Symposium
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Despite remarkable improvements, automatic speech recognition is susceptible to adversarial perturbations. Compared to standard machine learning architectures, these attacks are significantly more challenging, especially since the inputs to a speech recognition system are time series that contain both acoustic and linguistic properties of speech. Extracting all recognition-relevant information requires more complex pipelines and an ensemble of specialized components. Consequently, an attacker needs to consider the entire pipeline. In this paper, we present VENOMAVE, the first training- time poisoning attack against speech recognition. Similar to the predominantly studied evasion attacks, we pursue the same goal: leading the system to an incorrect and attacker-chosen transcription of a target audio waveform. In contrast to evasion attacks, however, we assume that the attacker can only manipulate a small part of the training data without altering the target audio waveform at runtime. We evaluate our attack on two datasets: TIDIGITS and Speech Commands. When poisoning less than 0.17% of the dataset, VENOMAVE achieves attack success rates of more than 80.0%, without access to the victim’s network architecture or hyperparameters. In a more realistic scenario, when the target audio waveform is played over the air in different rooms, VENOMAVE maintains a success rate of up to 73.3%. Finally, VENOMAVE achieves an attack transferability rate of 36.4% between two different model architectures. 
    more » « less
  2. The types of human activities occupants are engaged in within indoor spaces significantly contribute to the spread of airborne diseases through emitting aerosol particles. Today, ubiquitous computing technologies can inform users of common atmosphere pollutants for indoor air quality. However, they remain uninformed of the rate of aerosol generated directly from human respiratory activities, a fundamental parameter impacting the risk of airborne transmission. In this paper, we present AeroSense, a novel privacy-preserving approach using audio sensing to accurately predict the rate of aerosol generated from detecting the kinds of human respiratory activities and determining the loudness of these activities. Our system adopts a privacy-first as a key design choice; thus, it only extracts audio features that cannot be reconstructed into human audible signals using two omnidirectional microphone arrays. We employ a combination of binary classifiers using the Random Forest algorithm to detect simultaneous occurrences of activities with an average recall of 85%. It determines the level of all detected activities by estimating the distance between the microphone and the activity source. This level estimation technique yields an average of 7.74% error. Additionally, we developed a lightweight mask detection classifier to detect mask-wearing, which yields a recall score of 75%. These intermediary outputs are critical predictors needed for AeroSense to estimate the amounts of aerosol generated from an active human source. Our model to predict aerosol is a Random Forest regression model, which yields 2.34 MSE and 0.73 r2 value. We demonstrate the accuracy of AeroSense by validating our results in a cleanroom setup and using advanced microbiological technology. We present results on the efficacy of AeroSense in natural settings through controlled and in-the-wild experiments. The ability to estimate aerosol emissions from detected human activities is part of a more extensive indoor air system integration, which can capture the rate of aerosol dissipation and inform users of airborne transmission risks in real time. 
    more » « less
  3. Reconfigurable intelligent surface (RIS) technology can enhance the performance of wireless systems, but an ad-versary can use such technology to deteriorate communication links. This paper explores an RIS-based attack on multi-user wireless systems that require channel reciprocity for time-division duplexing (TDD). We demonstrate that deploying an RIS with a non-diagonal phase shift matrix can compromise channel reciprocity and lead to poor TDD performance. The attack can be achieved without transmission of signal energy, without channel state information (CSI), and without synchronization with the legitimate system, and thus it is difficult to detect and counteract. We provide an extensive set of simulation studies on the impact of such an attack on the achievable sum rate of the legitimate system, and we design a heuristic algorithm for optimizing the attack in cases where some partial knowledge of the CSI is available. Our results demonstrate that this channel reciprocity attack can significantly degrade the performance of the legitimate system. 
    more » « less
  4. The secure functioning of automotive systems is vital to the safety of their passengers and other roadway users. One of the critical functions for safety is the controller area network (CAN), which interconnects the safety-critical electronic control units (ECUs) in the majority of ground vehicles. Unfortunately CAN is known to be vulnerable to several attacks. One such attack is the bus-off attack, which can be used to cause a victim ECU to disconnect itself from the CAN bus and, subsequently, for an attacker to masquerade as that ECU. A limitation of the bus-off attack is that it requires the attacker to achieve tight synchronization between the transmission of the victim and the attacker’s injected message. In this paper, we introduce a schedule-based attack framework for the CAN bus-off attack that uses the real-time schedule of the CAN bus to predict more attack opportunities than previously known. We describe a ranking method for an attacker to select and optimize its attack injections with respect to criteria such as attack success rate, bus perturbation, or attack latency. The results show that vulnerabilities of the CAN bus can be enhanced by schedulebased attacks. 
    more » « less
  5. The state-of-the-art performance of deep learning algorithms has led to a considerable increase in the utilization of machine learning in security-sensitive and critical applications. However, it has recently been shown that a small and carefully crafted perturbation in the input space can completely fool a deep model. In this study, we explore the extent to which face recognition systems are vulnerable to geometrically-perturbed adversarial faces. We propose a fast landmark manipulation method for generating adversarial faces, which is approximately 200 times faster than the previous geometric attacks and obtains 99.86% success rate on the state-of-the-art face recognition models. To further force the generated samples to be natural, we introduce a second attack constrained on the semantic structure of the face which has the half speed of the first attack with the success rate of 99.96%. Both attacks are extremely robust against the state-of-the-art defense methods with the success rate of equal or greater than 53.59%. Code is available at https://github.com/alldbi/FLM. 
    more » « less