skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on November 21, 2025

Title: ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body
We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura around the body. The acoustic signals are reflected based on the position and motion of various body parts, captured by the microphones, and analyzed by a customized self-supervised deep learning framework to infer the performed activities on a remote device such as a mobile phone or cloud server. ActSonic was evaluated in user studies with 19 participants across 19 households to track its efficacy in everyday activity recognition. Without requiring any training data from new users (leave-one-participant-out evaluation), ActSonic detected 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.  more » « less
Award ID(s):
2239569
PAR ID:
10583907
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
8
Issue:
4
ISSN:
2474-9567
Page Range / eLocation ID:
1 to 32
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications. 
    more » « less
  2. The vibrational response of an elastic panel to incident acoustic waves is determined by the direction-of-arrival (DOA) of the waves relative to the spatial structure of the panel's bending modes. By monitoring the relative modal excitations of a panel immersed in a sound field, the DOA of the source may be inferred. In reverberant environments, early acoustic reflections and the late diffuse acoustic field may obscure the DOA of incoming sound waves. Panel microphones may be especially susceptible to the effects of reverberation due to their large surface areas and long-decaying impulse responses. An investigation into the effect of reverberation on the accuracy of DOA estimation with panel microphones was made by recording wake-word utterances in eight spaces with reverberation times (RT60s) ranging from 0.27 to 3.00 s. The responses were used to train neural networks to estimate the DOA. Within ±5°, DOA estimation reliability was measured at 95.00% in the least reverberant space, decreasing to 78.33% in the most reverberant space, suggesting an inverse relationship between RT60 and DOA accuracy. Experimental results suggest that a system for estimating DOA with panel microphones can generalize to new acoustic environments by cross-training the system with data from multiple spaces with different RT60s. 
    more » « less
  3. null (Ed.)
    We derive a radiative transfer equation that accounts for coupling from surface waves to body waves and the other way around. The model is the acoustic wave equation in a two-dimensional waveguide with reflecting boundary. The waveguide has a thin, weakly randomly heterogeneous layer near the top surface, and a thick homogeneous layer beneath it. There are two types of modes that propagate along the axis of the waveguide: those that are almost trapped in the thin layer, and thus model surface waves, and those that penetrate deep in the waveguide, and thus model body waves. The remaining modes are evanescent waves. We introduce a mathematical theory of mode coupling induced by scattering in the thin layer, and derive a radiative transfer equation which quantifies the mean mode power exchange.We study the solution of this equation in the asymptotic limit of infinite width of the waveguide. The main result is a quantification of the rate of convergence of the mean mode powers toward equipartition. 
    more » « less
  4. Smart home cameras present new challenges for understanding behaviors and relationships surrounding always-on, domestic recording systems. We designed a series of discursive activities involving 16 individuals from ten households for six weeks in their everyday settings. These activities functioned as speculative probes prompting participants to reflect on themes of privacy and power through filming with cameras in their households. Our research design foregrounded critical-playful enactments that allowed participants to speculate potentials for relationships with cameras in the home beyond everyday use. We present four key dynamics with participants and home cameras by examining their relationships to: the camera’s eye, filming, their data, and camera’s societal contexts. We contribute discussions about the mundane, information privacy, and post-hoc reflection with one’s camera footage. Overall, our findings reveal the camera as a strange, yet banal entity in the home—interrogating how participants compose and handle their own and others’ video data. 
    more » « less
  5. null (Ed.)
    Annotated IMU sensor data from smart devices and wearables are essential for developing supervised models for fine-grained human activity recognition, albeit generating sufficient annotated data for diverse human activities under different environments is challenging. Existing approaches primarily use human-in-the-loop based techniques, including active learning; however, they are tedious, costly, and time-consuming. Leveraging the availability of acoustic data from embedded microphones over the data collection devices, in this paper, we propose LASO, a multimodal approach for automated data annotation from acoustic and locomotive information. LASO works over the edge device itself, ensuring that only the annotated IMU data is collected, discarding the acoustic data from the device itself, hence preserving the audio-privacy of the user. In the absence of any pre-existing labeling information, such an auto-annotation is challenging as the IMU data needs to be sessionized for different time-scaled activities in a completely unsupervised manner. We use a change-point detection technique while synchronizing the locomotive information from the IMU data with the acoustic data, and then use pre-trained audio-based activity recognition models for labeling the IMU data while handling the acoustic noises. LASO efficiently annotates IMU data, without any explicit human intervention, with a mean accuracy of 0.93 ($$\pm 0.04$$) and 0.78 ($$\pm 0.05$$) for two different real-life datasets from workshop and kitchen environments, respectively. 
    more » « less