skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: EchoSafe: Sonar-based Verifiable Interaction with Intelligent Digital Agents
Voice controlled interactive smart speakers, such as Google Home, Amazon Echo, and Apple HomePod are becoming commonplace in today's homes. These devices listen continually for the user commands, that are triggered by special keywords, such as "Alexa" and "Hey Siri". Recent research has shown that these devices are vulnerable to attacks through malicious voice commands from nearby devices. The commands can be sent easily during unoccupied periods, so that the user may be unaware of such attacks. We present EchoSafe, a user-friendly sonar-based defense against these attacks. When the user sends a critical command to the smart speaker, EchoSafe sends an audio pulse followed by post processing to determine if the user is present in the room. We can detect the user's presence during critical commands with 93.13% accuracy, and our solution can be extended to defend against other attack scenarios, as well.  more » « less
Award ID(s):
1705135
PAR ID:
10074622
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 1st ACM Workshop on the Internet of Safe Things
Page Range / eLocation ID:
38 to 43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Voice-activated commands have become a key feature of popular devices such as smartphones, home assistants, and wearables. For convenience, many people configure their devices to be ‘always on’ and listening for voice commands from the user using a trigger phrase such as “Hey Siri,” “Okay Google,” or “Alexa.” However, false positives for these triggers often result in privacy violations with conversations being inadvertently uploaded to the cloud. In addition, malware that can record one’s conversations remains a signifi-cant threat to privacy. Unlike with cameras, which people can physically obscure and be assured of their privacy, people do not have a way of knowing whether their microphone is indeed off and are left with no tangible defenses against voice based attacks. We envision a general-purpose physical defense that uses a speaker to inject specialized obfuscating ‘babble noise’ into the microphones of devices to protect against automated and human based attacks. We present a comprehensive study of how specially crafted, personalized ‘babble’ noise (‘MyBabble’) can be effective at moderate signal-to-noise ratios and can provide a viable defense against microphone based eavesdropping attacks. 
    more » « less
  2. Automatic Speech Recognition (ASR) systems are widely used in various online transcription services and personal digital assistants. Emerging lines of research have demonstrated that ASR systems are vulnerable to hidden voice commands, i.e., audio that can be recognized by ASRs but not by humans. Such attacks, however, often either highly depend on white-box knowledge of a specific machine learning model or require special hardware to construct the adversarial audio. This paper proposes a new model-agnostic and easily-constructed attack, called CommanderGabble, which uses fast speech to camouflage voice commands. Both humans and ASR systems often misinterpret fast speech, and such misinterpretation can be exploited to launch hidden voice command attacks. Specifically, by carefully manipulating the phonetic structure of a target voice command, ASRs can be caused to derive a hidden meaning from the manipulated, high-speed version. We implement the discovered attacks both over-the-wire and over-the-air, and conduct a suite of experiments to demonstrate their efficacy against 7 practical ASR systems. Our experimental results show that the over-the-wire attacks can disguise as many as 96 out of 100 tested voice commands into adversarial ones, and that the over-the-air attacks are consistently successful for all 18 chosen commands in multiple real-world scenarios. 
    more » « less
  3. Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces. 
    more » « less
  4. Voice biometrics is drawing increasing attention to user authentication on smart devices. However, voice biometrics is vulnerable to replay attacks, where adversaries try to spoof voice authentication systems using pre-recorded voice samples collected from genuine users. To this end, we propose VoiceGesture, a liveness detection solution for voice authentication on smart devices such as smartphones and smart speakers. With audio hardware advances on smart devices, VoiceGesture leverages built-in speaker and microphone pairs on smart devices as Doppler Radar to sense articulatory gestures for liveness detection during voice authentication. The experiments with 21 participants and different smart devices show that VoiceGesture achieves over 99% and around 98% detection accuracy for text-dependent and text-independent liveness detection, respectively. Moreover, VoiceGesture is robust to different device placements, low audio sampling frequency, and supports medium range liveness detection on smart speakers in various use scenarios, including smart homes and smart vehicles. 
    more » « less
  5. It is estimated that by the year 2024, the total number of systems equipped with voice assistant software will exceed 8.4 billion devices globally. While these devices provide convenience to consumers, they suffer from a myriad of security issues. This paper highlights the serious privacy threats exposed by information leakage in a smart assistant's encrypted network traffic metadata. To investigate this issue, we have collected a new dataset composed of dynamic and static commands posed to an Amazon Echo Dot using data collection and cleaning scripts we developed. Furthermore, we propose the Smart Home Assistant Malicious Ensemble model (SHAME) as the new state-of-the-art Voice Command Fingerprinting classifier. When evaluated against several datasets, our attack correctly classifies encrypted voice commands with up to 99.81% accuracy on Google Home traffic and 95.2% accuracy on Amazon Echo Dot traffic. These findings show that security measures must be taken to stop internet service providers, nation-states, and network eavesdroppers from monitoring our intimate conversations. 
    more » « less