skip to main content


Title: EchoSafe: Sonar-based Verifiable Interaction with Intelligent Digital Agents
Voice controlled interactive smart speakers, such as Google Home, Amazon Echo, and Apple HomePod are becoming commonplace in today's homes. These devices listen continually for the user commands, that are triggered by special keywords, such as "Alexa" and "Hey Siri". Recent research has shown that these devices are vulnerable to attacks through malicious voice commands from nearby devices. The commands can be sent easily during unoccupied periods, so that the user may be unaware of such attacks. We present EchoSafe, a user-friendly sonar-based defense against these attacks. When the user sends a critical command to the smart speaker, EchoSafe sends an audio pulse followed by post processing to determine if the user is present in the room. We can detect the user's presence during critical commands with 93.13% accuracy, and our solution can be extended to defend against other attack scenarios, as well.  more » « less
Award ID(s):
1705135
NSF-PAR ID:
10074622
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 1st ACM Workshop on the Internet of Safe Things
Page Range / eLocation ID:
38 to 43
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract Voice-activated commands have become a key feature of popular devices such as smartphones, home assistants, and wearables. For convenience, many people configure their devices to be ‘always on’ and listening for voice commands from the user using a trigger phrase such as “Hey Siri,” “Okay Google,” or “Alexa.” However, false positives for these triggers often result in privacy violations with conversations being inadvertently uploaded to the cloud. In addition, malware that can record one’s conversations remains a signifi-cant threat to privacy. Unlike with cameras, which people can physically obscure and be assured of their privacy, people do not have a way of knowing whether their microphone is indeed off and are left with no tangible defenses against voice based attacks. We envision a general-purpose physical defense that uses a speaker to inject specialized obfuscating ‘babble noise’ into the microphones of devices to protect against automated and human based attacks. We present a comprehensive study of how specially crafted, personalized ‘babble’ noise (‘MyBabble’) can be effective at moderate signal-to-noise ratios and can provide a viable defense against microphone based eavesdropping attacks. 
    more » « less
  2. Reliably identifying and authenticating smart- phones is critical in our daily life since they are increasingly being used to manage sensitive data such as private messages and financial data. Recent researches on hardware fingerprinting show that each smartphone, regardless of the manufacturer or make, possesses a variety of hardware fingerprints that are unique, robust, and physically unclonable. There is a growing interest in designing and implementing hardware-rooted smart- phone authentication which authenticates smartphones through verifying the hardware fingerprints of their built-in sensors. Unfortunately, previous fingerprinting methods either involve large registration overhead or suffer from fingerprint forgery attacks, rendering them infeasible in authentication systems. In this paper, we propose ABC, a real-time smartphone Au- thentication protocol utilizing the photo-response non-uniformity (PRNU) of the Built-in Camera. In contrast to previous works that require tens of images to build reliable PRNU features for conventional cameras, we are the first to observe that one image alone can uniquely identify a smartphone due to the unique PRNU of a smartphone image sensor. This new discovery makes the use of PRNU practical for smartphone authentication. While most existing hardware fingerprints are vulnerable against forgery attacks, ABC defeats forgery attacks by verifying a smartphone’s PRNU identity through a challenge response protocol using a visible light communication channel. A user captures two time-variant QR codes and sends the two images to a server, which verifies the identity by fingerprint and image content matching. The time-variant QR codes can also defeat replay attacks. Our experiments with 16,000 images over 40 smartphones show that ABC can efficiently authenticate user devices with an error rate less than 0.5%. 
    more » « less
  3. Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces. 
    more » « less
  4. Reliably identifying and authenticating smartphones is critical in our daily life since they are increasingly being used to manage sensitive data such as private messages and financial data. Recent researches on hardware fingerprinting show that each smartphone, regardless of the manufacturer or make, possesses a variety of hardware fingerprints that are unique, robust, and physically unclonable. There is a growing interest in designing and implementing hardware-rooted smartphone authentication which authenticates smartphones through verifying the hardware fingerprints of their built-in sensors. Unfortunately, previous fingerprinting methods either involve large registration overhead or suffer from fingerprint forgery attacks, rendering them infeasible in authentication systems. In this paper, we propose ABC, a real-time smartphone Authentication protocol utilizing the photo-response non-uniformity (PRNU) of the Built-in Camera. In contrast to previous works that require tens of images to build reliable PRNU features for conventional cameras, we are the first to observe that one image alone can uniquely identify a smartphone due to the unique PRNU of a smartphone image sensor. This new discovery makes the use of PRNU practical for smartphone authentication. While most existing hardware fingerprints are vulnerable against forgery attacks, ABC defeats forgery attacks by verifying a smartphone’s PRNU identity through a challenge response protocol using a visible light communication channel. A user captures two time-variant QR codes and sends the two images to a server, which verifies the identity by fingerprint and image content matching. The time-variant QR codes can also defeat replay attacks. Our experiments with 16,000 images over 40 smartphones show that ABC can efficiently authenticate user devices with an error rate less than 0.5%. 
    more » « less
  5. Voice biometrics is drawing increasing attention to user authentication on smart devices. However, voice biometrics is vulnerable to replay attacks, where adversaries try to spoof voice authentication systems using pre-recorded voice samples collected from genuine users. To this end, we propose VoiceGesture, a liveness detection solution for voice authentication on smart devices such as smartphones and smart speakers. With audio hardware advances on smart devices, VoiceGesture leverages built-in speaker and microphone pairs on smart devices as Doppler Radar to sense articulatory gestures for liveness detection during voice authentication. The experiments with 21 participants and different smart devices show that VoiceGesture achieves over 99% and around 98% detection accuracy for text-dependent and text-independent liveness detection, respectively. Moreover, VoiceGesture is robust to different device placements, low audio sampling frequency, and supports medium range liveness detection on smart speakers in various use scenarios, including smart homes and smart vehicles. 
    more » « less