skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Auditory Eyesight: Demystifying {μs-Precision} Keystroke Tracking Attacks on Unconstrained Keyboard Inputs
In various scenarios from system login to writing emails, documents, and forms, keyboard inputs carry alluring data such as passwords, addresses, and IDs. Due to commonly existing non-alphabetic inputs, punctuation, and typos, users' natural inputs rarely contain only constrained, purely alphabetic keys/words. This work studies how to reveal unconstrained keyboard inputs using auditory interfaces. Audio interfaces are not intended to have the capability of light sensors such as cameras to identify compactly located keys. Our analysis shows that effectively distinguishing the keys can require a fine localization precision level of keystroke sounds close to the range of microseconds. This work (1) explores the limits of audio interfaces to distinguish keystrokes, (2) proposes a μs-level customized signal processing and analysis-based keystroke tracking approach that takes into account the mechanical physics and imperfect measuring of keystroke sounds, (3) develops the first acoustic side-channel attack study on unconstrained keyboard inputs that are not purely alphabetic keys/words and do not necessarily follow known sequences in a given dictionary or training dataset, and (4) reveals the threats of non-line-of-sight keystroke sound tracking. Our results indicate that, without relying on vision sensors, attacks using limited-resolution audio interfaces can reveal unconstrained inputs from the keyboard with a fairly sharp and bendable "auditory eyesight."  more » « less
Award ID(s):
1946231 2229752 2231682 2117785
PAR ID:
10454030
Author(s) / Creator(s):
Publisher / Repository:
USENIX Association
Date Published:
Journal Name:
32nd USENIX Security Symposium (USENIX Security 23)
ISBN:
978-1-939133-37-3
Page Range / eLocation ID:
175--192
Format(s):
Medium: X
Location:
Anaheim, CA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Smart IoT Speakers, while connected over a network, currently only produce sounds that come directly from the individual devices. We envision a future where smart speakers collaboratively produce a fabric of spatial audio, capable of perceptually placing sound in a range of locations in physical space. This could provide audio cues in homes, offices and public spaces that are flexibly linked to various positions. The perception of spatialized audio relies on binaural cues, especially the time difference and the level difference of incident sound at a user’s left and right ears. Traditional stereo speakers cannot create the spatialization perception for a user when playing binaural audio due to auditory crosstalk, as each ear hears a combination of both speaker outputs. We present Xblock, a novel time-domain pose-adaptive crosstalk cancellation technique that creates a spatial audio perception over a pair of speakers using knowledge of the user’s head pose and speaker positions. We build a prototype smart speaker IoT system empowered by Xblock, explore the effectiveness of Xblock through signal analysis, and discuss future perceptual user studies and future work. 
    more » « less
  2. In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface.We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique.We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback. 
    more » « less
  3. Most bat species have highly developed audio-vocal systems, which allow them to adjust the features of echolocation calls that are optimized for different sonar tasks, such as detecting, localizing, discriminating and tracking targets. Furthermore, bats can also produce a wide array of social calls to communicate with conspecifics. The acoustic properties of some social calls differ only subtly from echolocation calls, yet bats have the ability to distinguish them and reliably produce appropriate behavioral responses. Little is known about the underlying neural processes that enable the correct classification of bat social communication sounds. One approach to this question is to identify the brain regions that are involved in the processing of sounds that carry behavioral relevance. Here, we present preliminary data on neuronal activation, as measured by c-fos expression, in big brown bats (Eptesicus fuscus) exposed to either social calls, echolocation calls or kept in silence. We focused our investigation on five relevant brain areas; three within the canonical auditory pathway (auditory cortex, inferior colliculus and medial geniculate body) and two that are involved in the processing of emotive stimulus content (amygdala and nucleus accumbens). In this manuscript we report c-fos staining of the areas of interest after exposure to conspecific calls. We discuss future work designed to overcome experimental limitations and explore whether c-fos staining reveals anatomical segregation of neurons activated by echolocation and social call call categories. 
    more » « less
  4. Accuracy and speed are pivotal when it comes to typing. Mixed reality headsets offer users the groundbreaking ability to project virtual objects into the physical world. However, when typing on a virtual keyboard in mixed reality space, users lose the tactile feedback that comes with a physical keyboard, making typing much more difficult. Our goal was to explore the capability of users to type using all ten fingers on a virtual key in mixed reality. We measured user performance when typing with index fingers versus all ten fingers. We also examined the usage of eye-tracking to disable all keys the user wasn’t looking at, and the effect it had on improving speed and accuracy. Our findings so far indicate that, while eyetracking seems to help accuracy, it is not enough to bring 10 finger typing up to the same level of performance as index finger typing. 
    more » « less
  5. Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music. 
    more » « less