skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on November 13, 2024

Title: Jawthenticate: Microphone-free Speech-based Authentication using Jaw Motion and Facial Vibrations
In this paper, we present Jawthenticate, an earable system that authenticates a user using audible or inaudible speech without us- ing a microphone. This system can overcome the shortcomings of traditional voice-based authentication systems like unreliability in noisy conditions and spoofing using microphone-based replay attacks. Jawthenticate derives distinctive speech-related features from the jaw motion and associated facial vibrations. This combi- nation of features makes Jawthenticate resilient to vocal imitations as well as camera-based spoofing. We use these features to train a two-class SVM classifier for each user. Our system is invariant to the content and language of speech. In a study conducted with 41 subjects, who speak different native languages, Jawthenticate achieves a Balanced Accuracy (BAC) of 97.07%, True Positive Rate (TPR) of 97.75%, and True Negative Rate (TNR) of 96.4% with just 3 seconds of speech data.  more » « less
Award ID(s):
2238553
NSF-PAR ID:
10490370
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Format(s):
Medium: X
Location:
Istanbul, Turkiye
Sponsoring Org:
National Science Foundation
More Like this
  1. Background Comprehensive exams such as the Dean-Woodcock Neuropsychological Assessment System, the Global Deterioration Scale, and the Boston Diagnostic Aphasia Examination are the gold standard for doctors and clinicians in the preliminary assessment and monitoring of neurocognitive function in conditions such as neurodegenerative diseases and acquired brain injuries (ABIs). In recent years, there has been an increased focus on implementing these exams on mobile devices to benefit from their configurable built-in sensors, in addition to scoring, interpretation, and storage capabilities. As smartphones become more accepted in health care among both users and clinicians, the ability to use device information (eg, device position, screen interactions, and app usage) for subject monitoring also increases. Sensor-based assessments (eg, functional gait using a mobile device’s accelerometer and/or gyroscope or collection of speech samples using recordings from the device’s microphone) include the potential for enhanced information for diagnoses of neurological conditions; mapping the development of these conditions over time; and monitoring efficient, evidence-based rehabilitation programs. Objective This paper provides an overview of neurocognitive conditions and relevant functions of interest, analysis of recent results using smartphone and/or tablet built-in sensor information for the assessment of these different neurocognitive conditions, and how human-device interactions and the assessment and monitoring of these neurocognitive functions can be enhanced for both the patient and health care provider. Methods This survey presents a review of current mobile technological capabilities to enhance the assessment of various neurocognitive conditions, including both neurodegenerative diseases and ABIs. It explores how device features can be configured for assessments as well as the enhanced capability and data monitoring that will arise due to the addition of these features. It also recognizes the challenges that will be apparent with the transfer of these current assessments to mobile devices. Results Built-in sensor information on mobile devices is found to provide information that can enhance neurocognitive assessment and monitoring across all functional categories. Configurations of positional sensors (eg, accelerometer, gyroscope, and GPS), media sensors (eg, microphone and camera), inherent sensors (eg, device timer), and participatory user-device interactions (eg, screen interactions, metadata input, app usage, and device lock and unlock) are all helpful for assessing these functions for the purposes of training, monitoring, diagnosis, or rehabilitation. Conclusions This survey discusses some of the many opportunities and challenges of implementing configured built-in sensors on mobile devices to enhance assessments and monitoring of neurocognitive functions as well as disease progression across neurodegenerative and acquired neurological conditions. 
    more » « less
  2. Smartphones are the most commonly used computing platform for accessing sensitive and important information placed on the Internet. Authenticating the smartphone's identity in addition to the user's identity is a widely adopted security augmentation method since conventional user authentication methods, such as password entry, often fail to provide strong protection by itself. In this paper, we propose a sensor-based device fingerprinting technique for identifying and authenticating individual mobile devices. Our technique, called MicPrint, exploits the unique characteristics of embedded microphones in mobile devices due to manufacturing variations in order to uniquely identify each device. Unlike conventional sensor-based device fingerprinting that are prone to spoofing attack via malware, MicPrint is fundamentally spoof-resistant since it uses acoustic features that are prominent only when the user blocks the microphone hole. This simple user intervention acts as implicit permission to fingerprint the sensor and can effectively prevent unauthorized fingerprinting using malware. We implement MicPrint on Google Pixel 1 and Samsung Nexus to evaluate the accuracy of device identification. We also evaluate its security against simple raw data attacks and sophisticated impersonation attacks. The results show that after several incremental training cycles under various environmental noises, MicPrint can achieve high accuracy and reliability for both smartphone models. 
    more » « less
  3. This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion of motion sensors such as accelerometer and gyroscope. While microphones require permissions from the user to be accessible by an app developer, the motion sensors are zero-permission sensors, thus accessible by a developer without alerting the user. This allows a malicious app to potentially eavesdrop on sensitive speech content played by the user's phone. In designing the attack, iSpyU tackles a number of technical challenges including: (i) Low sampling rate of motion sensors (500 Hz in comparison to 44 kHz for a microphone). (ii) Lack of availability of large-scale training datasets to train models for Automatic Speech Recognition (ASR) with motion sensors. iSpyU systematically addresses these challenges by a combination of techniques in synthetic training data generation, ASR modeling, and domain adaptation. Extensive measurement studies on modern smartphones show a word level accuracy of 53.3 - 59.9% over a dictionary of 2000-10000 words, and a character level accuracy of 70.0 - 74.8%. We believe such levels of accuracy poses a significant threat when viewed from a privacy perspective. 
    more » « less
  4. The prevalence of voice spoofing attacks in today’s digital world has become a critical security concern. Attackers employ various techniques, such as voice conversion (VC) and text-to-speech (TTS), to generate synthetic speech that imitates the victim’s voice and gain access to sensitive information. The recent advances in synthetic speech generation pose a significant threat to modern security systems, while traditional voice authentication methods are incapable of detecting them effectively. To address this issue, a novel solution for logical access (LA)-based synthetic speech detection is proposed in this paper. SpoTNet is an attention-based spoofing transformer network that includes crafted front-end spoofing features and deep attentive features retrieved using the developed logical spoofing transformer encoder (LSTE). The derived attentive features were then processed by the proposed multi-layer spoofing classifier to classify speech samples as bona fide or synthetic. In synthetic speeches produced by the TTS algorithm, the spectral characteristics of the synthetic speech are altered to match the target speaker’s formant frequencies, while in VC attacks, the temporal alignment of the speech segments is manipulated to preserve the target speaker’s prosodic features. By highlighting these observations, this paper targets the prosodic and phonetic-based crafted features, i.e., the Mel-spectrogram, spectral contrast, and spectral envelope, presenting an effective preprocessing pipeline proven to be effective in synthetic speech detection. The proposed solution achieved state-of-the-art performance against eight recent feature fusion methods with lower EER of 0.95% on the ASVspoof-LA dataset, demonstrating its potential to advance the field of speaker identification and improve speaker recognition systems. 
    more » « less
  5. Earables (ear wearables) are rapidly emerging as a new platform encompassing a diverse range of personal applications. The traditional authentication methods hence become less applicable and inconvenient for earables due to their limited input interface. Nevertheless, earables often feature rich around-the-head sensing capability that can be leveraged to capture new types of biometrics. In this work, we propose ToothSonic that leverages the toothprint-induced sonic effect produced by a user performing teeth gestures for earable authentication. In particular, we design representative teeth gestures that can produce effective sonic waves carrying the information of the toothprint. To reliably capture the acoustic toothprint, it leverages the occlusion effect of the ear canal and the inward-facing microphone of the earables. It then extracts multi-level acoustic features to reflect the intrinsic toothprint information for authentication. The key advantages of ToothSonic are that it is suitable for earables and is resistant to various spoofing attacks as the acoustic toothprint is captured via the user's private teeth-ear channel that modulates and encrypts the sonic waves. Our experiment studies with 25 participants show that ToothSonic achieves up to 95% accuracy with only one of the users' tooth gestures. 
    more » « less