skip to main content


This content will become publicly available on August 9, 2024

Title: Spying through Your Voice Assistants: Realistic Voice Command Fingerprinting
Voice assistants are becoming increasingly pervasive due to the convenience and automation they provide through the voice interface. However, such convenience often comes with unforeseen security and privacy risks. For example, encrypted traffic from voice assistants can leak sensitive information about their users' habits and lifestyles. In this paper, we present a taxonomy of fingerprinting voice commands on the most popular voice assistant platforms (Google, Alexa, and Siri). We also provide a deeper understanding of the feasibility of fingerprinting third-party applications and streaming services over the voice interface. Our analysis not only improves the state-of-the-art technique but also studies a more realistic setup for fingerprinting voice activities over encrypted traffic.Our proposed technique considers a passive network eavesdropper observing encrypted traffic from various devices within a home and, therefore, first detects the invocation/activation of voice assistants followed by what specific voice command is issued. Using an end-to-end system design, we show that it is possible to detect when a voice assistant is activated with 99% accuracy and then utilize the subsequent traffic pattern to infer more fine-grained user activities with around 77-80% accuracy.  more » « less
Award ID(s):
1849997
NSF-PAR ID:
10442721
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
32nd USENIX Security Symposium (USENIX Security 23)
Page Range / eLocation ID:
2419--2436
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. It is estimated that by the year 2024, the total number of systems equipped with voice assistant software will exceed 8.4 billion devices globally. While these devices provide convenience to consumers, they suffer from a myriad of security issues. This paper highlights the serious privacy threats exposed by information leakage in a smart assistant's encrypted network traffic metadata. To investigate this issue, we have collected a new dataset composed of dynamic and static commands posed to an Amazon Echo Dot using data collection and cleaning scripts we developed. Furthermore, we propose the Smart Home Assistant Malicious Ensemble model (SHAME) as the new state-of-the-art Voice Command Fingerprinting classifier. When evaluated against several datasets, our attack correctly classifies encrypted voice commands with up to 99.81% accuracy on Google Home traffic and 95.2% accuracy on Amazon Echo Dot traffic. These findings show that security measures must be taken to stop internet service providers, nation-states, and network eavesdroppers from monitoring our intimate conversations. 
    more » « less
  2. Abstract In recent years, we have seen rapid growth in the use and adoption of Internet of Things (IoT) devices. However, some loT devices are sensitive in nature, and simply knowing what devices a user owns can have security and privacy implications. Researchers have, therefore, looked at fingerprinting loT devices and their activities from encrypted network traffic. In this paper, we analyze the feasibility of fingerprinting IoT devices and evaluate the robustness of such fingerprinting approach across multiple independent datasets — collected under different settings. We show that not only is it possible to effectively fingerprint 188 loT devices (with over 97% accuracy), but also to do so even with multiple instances of the same make-and-model device. We also analyze the extent to which temporal, spatial and data-collection-methodology differences impact fingerprinting accuracy. Our analysis sheds light on features that are more robust against varying conditions. Lastly, we comprehensively analyze the performance of our approach under an open-world setting and propose ways in which an adversary can enhance their odds of inferring additional information about unseen devices (e.g., similar devices manufactured by the same company). 
    more » « less
  3. null (Ed.)
    People interacting with voice assistants are often frustrated by voice assistants' frequent errors and inability to respond to backchannel cues. We introduce an open-source video dataset of 21 participants' interactions with a voice assistant, and explore the possibility of using this dataset to enable automatic error recognition to inform self-repair. The dataset includes clipped and labeled videos of participants' faces during free-form interactions with the voice assistant from the smart speaker's perspective. To validate our dataset, we emulated a machine learning classifier by asking crowdsourced workers to recognize voice assistant errors from watching soundless video clips of participants' reactions. We found trends suggesting it is possible to determine the voice assistant's performance from a participant's facial reaction alone. This work posits elicited datasets of interactive responses as a key step towards improving error recognition for repair for voice assistants in a wide variety of applications. 
    more » « less
  4. Most privacy-conscious users utilize HTTPS and an anonymity network such as Tor to mask source and destination IP addresses. It has been shown that encrypted and anonymized network traffic traces can still leak information through a type of attack called a website fingerprinting (WF) attack. The adversary records the network traffic and is only able to observe the number of incoming and outgoing messages, the size of each message, and the time difference between messages. In previous work, the effectiveness of website fingerprinting has been shown to have an accuracy of over 90% when using Tor as the anonymity network. Thus, an Internet Service Provider can successfully identify the websites its users are visiting. One main concern about website fingerprinting is its practicality. The common assumption in most previous work is that a victim is visiting one website at a time and has access to the complete network trace of that website. However, this is not realistic. We propose two new algorithms to deal with situations when the victim visits one website after another (continuous visits) and visits another website in the middle of visiting one website (overlapping visits). We show that our algorithm gives an accuracy of 80% (compared to 63% in a previous work [24]) in finding the split point which is the start point for the second website in a trace. Using our proposed “splitting” algorithm, websites can be predicted with an accuracy of 70%. When two website visits are overlapping, the website fingerprinting accuracy falls dramatically. Using our proposed “sectioning” algorithm, the accuracy for predicting the website in overlapping visits improves from 22.80% to 70%. When part of the network trace is missing (either the beginning or the end), the accuracy when using our sectioning algorithm increases from 20% to over 60%. 
    more » « less
  5. Deceptive design patterns (sometimes called “dark patterns”) are user interface design elements that may trick, deceive, or mislead users into behaviors that often benefit the party implementing the design over the end user. Prior work has taxonomized, investigated, and measured the prevalence of such patterns primarily in visual user interfaces (e.g., on websites). However, as the ubiquity of voice assistants and other voice-assisted technologies increases, we must anticipate how deceptive designs will be (and indeed, are already) deployed in voice interactions. This paper makes two contributions towards characterizing and surfacing deceptive design patterns in voice interfaces. First, we make a conceptual contribution, identifying key characteristics of voice interfaces that may enable deceptive design patterns, and surfacing existing and theoretical examples of such patterns. Second, we present the findings from a scenario-based user survey with 93 participants, in which we investigate participants’ perceptions of voice interfaces that we consider to be both deceptive and non-deceptive. 
    more » « less