skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Continuous Articulatory Gesture Based Liveness Detection for Voice Authentication on Smart Devices
Voice biometrics is drawing increasing attention to user authentication on smart devices. However, voice biometrics is vulnerable to replay attacks, where adversaries try to spoof voice authentication systems using pre-recorded voice samples collected from genuine users. To this end, we propose VoiceGesture, a liveness detection solution for voice authentication on smart devices such as smartphones and smart speakers. With audio hardware advances on smart devices, VoiceGesture leverages built-in speaker and microphone pairs on smart devices as Doppler Radar to sense articulatory gestures for liveness detection during voice authentication. The experiments with 21 participants and different smart devices show that VoiceGesture achieves over 99% and around 98% detection accuracy for text-dependent and text-independent liveness detection, respectively. Moreover, VoiceGesture is robust to different device placements, low audio sampling frequency, and supports medium range liveness detection on smart speakers in various use scenarios, including smart homes and smart vehicles.  more » « less
Award ID(s):
2131143 1801630
PAR ID:
10360837
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Internet of Things Journal
ISSN:
2372-2541
Page Range / eLocation ID:
1 to 14
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In recent years, biometrics (e.g., fingerprint or face recognition) has replaced traditional passwords and PINs as a widely used method for user authentication, particularly in personal or mobile devices. Differing from state-of-the-art biometrics, heart biometrics offer the advantages of liveness detection, which provides strong tolerance to spoofing attacks. To date, several authentication methods primarily focusing on electrocardiogram (ECG) have demonstrated remarkable success; however, the degree of exploration with other cardiac signals is still limited. To this end, we discuss the challenges in various cardiac domains and propose future prospectives for developing effective heart biometrics systems in real-world applications. 
    more » « less
  2. The popularity of smart home devices has led to an increase in security incidents happening in smart homes. A key measure to avoid such incidents is to authenticate users before they can interact with smart devices. However, current methods often require additional hardware. This article proposes STATION, a gesture-based authentication system, an effective gesture-based authentication method built on top of the voice interfaces already available in these smart home devices, without adding new hardware. STATION uses a gesture processing pipeline that identifies Doppler-existing frames and detects the direction of arrival of Reflection to authenticate users in low SNR environments and at longer distances. Furthermore, regarding the nature of gesture-based authentication, this system also supports detecting user liveness, preventing replay and synthesis attacks from remote attackers. The evaluation of STATION shows high accuracy with a false acceptance rate (FAR) of 0.08% and false rejection rate (FRR) of 3.10% for users within 1.5 m of the device. 
    more » « less
  3. Abstract Internet-connected voice-controlled speakers, also known as smart speakers , are increasingly popular due to their convenience for everyday tasks such as asking about the weather forecast or playing music. However, such convenience comes with privacy risks: smart speakers need to constantly listen in order to activate when the “wake word” is spoken, and are known to transmit audio from their environment and record it on cloud servers. In particular, this paper focuses on the privacy risk from smart speaker misactivations , i.e. , when they activate, transmit, and/or record audio from their environment when the wake word is not spoken. To enable repeatable, scalable experiments for exposing smart speakers to conversations that do not contain wake words, we turn to playing audio from popular TV shows from diverse genres. After playing two rounds of 134 hours of content from 12 TV shows near popular smart speakers in both the US and in the UK, we observed cases of 0.95 misactivations per hour, or 1.43 times for every 10,000 words spoken, with some devices having 10% of their misactivation durations lasting at least 10 seconds. We characterize the sources of such misactivations and their implications for consumers, and discuss potential mitigations. 
    more » « less
  4. Fake audio detection is expected to become an important research area in the field of smart speakers such as Google Home, Amazon Echo and chatbots developed for these platforms. This paper presents replay attack vulnerability of voice-driven interfaces and proposes a countermeasure to detect replay attack on these platforms. This paper introduces a novel framework to model replay attack distortion, and then use a non-learning-based method for replay attack detection on smart speakers. The reply attack distortion is modeled as a higher-order nonlinearity in the replay attack audio. Higher-order spectral analysis (HOSA) is used to capture characteristics distortions in the replay audio. The replay attack recordings are successfully injected into the Google Home device via Amazon Alexa using the drop-in conferencing feature. Effectiveness of the proposed HOSA-based scheme is evaluated using original recorded speech as well as corresponding played back recording to the Google Home via the Amazon Alexa using the drop-in conferencing feature. 
    more » « less
  5. Voice interfaces are increasingly becoming integrated into a variety of Internet of Things (IoT) devices. Such systems can dramatically simplify interactions between users and devices with limited displays. Unfortunately voice interfaces also create new opportunities for exploitation. Specifically any sound-emitting device within range of the system implementing the voice interface (e.g., a smart television, an Internet-connected appliance, etc) can potentially cause these systems to perform operations against the desires of their owners (e.g., unlock doors, make unauthorized purchases, etc). We address this problem by developing a technique to recognize fundamental differences in audio created by humans and electronic speakers. We identify sub-bass over-excitation, or the presence of significant low frequency signals that are outside of the range of human voices but inherent to the design of modern speakers, as a strong differentiator between these two sources. After identifying this phenomenon, we demonstrate its use in preventing adversarial requests, replayed audio, and hidden commands with a 100%/1.72% TPR/FPR in quiet environments. In so doing, we demonstrate that commands injected via nearby audio devices can be effectively removed by voice interfaces. 
    more » « less