skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scribe: Simultaneous Voice and Handwriting Interface
This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 μm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.  more » « less
Award ID(s):
2238433
PAR ID:
10514753
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
7
Issue:
4
ISSN:
2474-9567
Page Range / eLocation ID:
1 to 31
Subject(s) / Keyword(s):
Handwriting tracking voice assistants cross frequency continuous wave sonar
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The ability for a smart speaker to localize a user based on his/her voice opens the door to many new applications. In this paper, we present a novel system, MAVL, to localize human voice. It consists of three major components: (i) We first develop a novel multi-resolution analysis to estimate the Angle-of-Arrival (AoA) of time-varying low-frequency coherent voice signals coming from multiple propagation paths; (ii) We then automatically estimate the room structure by emitting acoustic signals and developing an improved 3D MUSIC algorithm; (iii) We finally re-trace the paths using the estimated AoA and room structure to localize the voice. We implement a prototype system using a single speaker and a uniform circular microphone array. Our results show that it achieves median errors of 1.49o and 3.33o for the top two AoAs estimation and achieves median localization errors of 0.31m in line-of-sight (LoS) cases and 0.47m in non-line-of-sight (NLoS) cases. 
    more » « less
  2. null (Ed.)
    The ability for a smart speaker to localize a user based on his/her voice opens the door to many new applications. In this paper, we present a novel system, MAVL, to localize human voice. It consists of three major components: (i) We first develop a novel multi-resolution analysis to estimate the AoA of time-varying low-frequency coherent voice signals coming from multiple propagation paths; (ii) We then automatically estimate the room structure by emitting acoustic signals and developing an improved 3D MUSIC algorithm; (iii) We finally re-trace the paths using the estimated AoA and room structure to localize the voice. We implement a prototype system using a single speaker and a uniform circular microphone array. Our results show that it achieves median errors of 1.49 degree and 3.33 degree for the top two AoAs estimation and achieves median localization errors of 0.31m in line-of-sight (LoS) cases and 0.47m in non-line-of-sight (NLoS) cases. 
    more » « less
  3. null (Ed.)
    Voice assistants such as Amazon Echo (Alexa) and Google Home use microphone arrays to estimate the angle of arrival (AoA) of the human voice. This paper focuses on adding user localization as a new capability to voice assistants. For any voice command, we desire Alexa to be able to localize the user inside the home. The core challenge is two-fold: (1) accurately estimating the AoAs of multipath echoes without the knowledge of the source signal, and (2) tracing back these AoAs to reverse triangulate the user's location.We develop VoLoc, a system that proposes an iterative align-and-cancel algorithm for improved multipath AoA estimation, followed by an error-minimization technique to estimate the geometry of a nearby wall reflection. The AoAs and geometric parameters of the nearby wall are then fused to reveal the user's location. Under modest assumptions, we report localization accuracy of 0.44 m across different rooms, clutter, and user/microphone locations. VoLoc runs in near real-time but needs to hear around 15 voice commands before becoming operational. 
    more » « less
  4. Voice assistants are becoming increasingly pervasive due to the convenience and automation they provide through the voice interface. However, such convenience often comes with unforeseen security and privacy risks. For example, encrypted traffic from voice assistants can leak sensitive information about their users' habits and lifestyles. In this paper, we present a taxonomy of fingerprinting voice commands on the most popular voice assistant platforms (Google, Alexa, and Siri). We also provide a deeper understanding of the feasibility of fingerprinting third-party applications and streaming services over the voice interface. Our analysis not only improves the state-of-the-art technique but also studies a more realistic setup for fingerprinting voice activities over encrypted traffic.Our proposed technique considers a passive network eavesdropper observing encrypted traffic from various devices within a home and, therefore, first detects the invocation/activation of voice assistants followed by what specific voice command is issued. Using an end-to-end system design, we show that it is possible to detect when a voice assistant is activated with 99% accuracy and then utilize the subsequent traffic pattern to infer more fine-grained user activities with around 77-80% accuracy. 
    more » « less
  5. Ear wearables (earables) are emerging platforms that are broadly adopted in various applications. There is an increasing demand for robust earables authentication because of the growing amount of sensitive information and the IoT devices that the earable could access. Traditional authentication methods become less feasible due to the limited input interface of earables. Nevertheless, the rich head-related sensing capabilities of earables can be exploited to capture human biometrics. In this paper, we propose EarSlide, an earable biometric authentication system utilizing the advanced sensing capacities of earables and the distinctive features of acoustic fingerprints when users slide their fingers on the face. It utilizes the inward-facing microphone of the earables and the face-ear channel of the ear canal to reliably capture the acoustic fingerprint. In particular, we study the theory of friction sound and categorize the characteristics of the acoustic fingerprints into three representative classes, pattern-class, ridge-groove-class, and coupling-class. Different from traditional fingerprint authentication only utilizes 2D patterns, we incorporate the 3D information in acoustic fingerprint and indirectly sense the fingerprint for authentication. We then design representative sliding gestures that carry rich information about the acoustic fingerprint while being easy to perform. It then extracts multi-class acoustic fingerprint features to reflect the inherent acoustic fingerprint characteristic for authentication. We also adopt an adaptable authentication model and a user behavior mitigation strategy to effectively authenticate legit users from adversaries. The key advantages of EarSlide are that it is resistant to spoofing attacks and its wide acceptability. Our evaluation of EarSlide in diverse real-world environments with intervals over one year shows that EarSlide achieves an average balanced accuracy rate of 98.37% with only one sliding gesture. 
    more » « less