In this paper, we introduce PoseSonic, an intelligent acoustic sensing solution for smartglasses that estimates upper body poses. Our system only requires two pairs of microphones and speakers on the hinges of the eyeglasses to emit FMCW-encoded inaudible acoustic signals and receive reflected signals for body pose estimation. Using a customized deep learning model, PoseSonic estimates the 3D positions of 9 body joints including the shoulders, elbows, wrists, hips, and nose. We adopt a cross-modal supervision strategy to train our model using synchronized RGB video frames as ground truth. We conducted in-lab and semi-in-the-wild user studies with 22 participants to evaluate PoseSonic, and our user-independent model achieved a mean per joint position error of 6.17 cm in the lab setting and 14.12 cm in semi-in-the-wild setting when predicting the 9 body joint positions in 3D. Our further studies show that the performance was not significantly impacted by different surroundings or when the devices were remounted or by real-world environmental noise. Finally, we discuss the opportunities, challenges, and limitations of deploying PoseSonic in real-world applications.
more »
« less
ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body
We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura around the body. The acoustic signals are reflected based on the position and motion of various body parts, captured by the microphones, and analyzed by a customized self-supervised deep learning framework to infer the performed activities on a remote device such as a mobile phone or cloud server. ActSonic was evaluated in user studies with 19 participants across 19 households to track its efficacy in everyday activity recognition. Without requiring any training data from new users (leave-one-participant-out evaluation), ActSonic detected 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.
more »
« less
- Award ID(s):
- 2239569
- PAR ID:
- 10583907
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Volume:
- 8
- Issue:
- 4
- ISSN:
- 2474-9567
- Page Range / eLocation ID:
- 1 to 32
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The vibrational response of an elastic panel to incident acoustic waves is determined by the direction-of-arrival (DOA) of the waves relative to the spatial structure of the panel's bending modes. By monitoring the relative modal excitations of a panel immersed in a sound field, the DOA of the source may be inferred. In reverberant environments, early acoustic reflections and the late diffuse acoustic field may obscure the DOA of incoming sound waves. Panel microphones may be especially susceptible to the effects of reverberation due to their large surface areas and long-decaying impulse responses. An investigation into the effect of reverberation on the accuracy of DOA estimation with panel microphones was made by recording wake-word utterances in eight spaces with reverberation times (RT60s) ranging from 0.27 to 3.00 s. The responses were used to train neural networks to estimate the DOA. Within ±5°, DOA estimation reliability was measured at 95.00% in the least reverberant space, decreasing to 78.33% in the most reverberant space, suggesting an inverse relationship between RT60 and DOA accuracy. Experimental results suggest that a system for estimating DOA with panel microphones can generalize to new acoustic environments by cross-training the system with data from multiple spaces with different RT60s.more » « less
-
Augmented reality (AR) is emerging as the next ubiquitous wearable technology and is expected to significantly transform various industries in the near future. There has been tremendous investment in developing AR eyeglasses in recent years, including about $45 billion investment by Meta since 2021. Despite such efforts, the existing displays are very bulky in form factor and there has not yet been a socially acceptable eyeglasses-style AR display. Such wearable display eyeglasses promise to unlock enormous potential in diverse applications such as medicine, education, navigation, and many more; but until eyeglass-style AR glasses are realized, those possibilities remain only a dream. My research addresses this problem and makes progress “towards everyday-use augmented reality eyeglasses” through computational imaging, displays, and perception. My dissertation (Chakravarthula, 2021) made advances in three key and seemingly distinct areas: first, digital holography and advanced algorithms for compact, high-quality, true 3-D holographic displays; second, hardware and software for robust and comprehensive 3-D eye tracking via Purkinje Images; and third, automatic focus adjusting AR display eyeglasses for well-focused virtual and real imagery, toward potentially achieving 20/20 vision for users of all ages.Not Availablemore » « less
-
null (Ed.)We derive a radiative transfer equation that accounts for coupling from surface waves to body waves and the other way around. The model is the acoustic wave equation in a two-dimensional waveguide with reflecting boundary. The waveguide has a thin, weakly randomly heterogeneous layer near the top surface, and a thick homogeneous layer beneath it. There are two types of modes that propagate along the axis of the waveguide: those that are almost trapped in the thin layer, and thus model surface waves, and those that penetrate deep in the waveguide, and thus model body waves. The remaining modes are evanescent waves. We introduce a mathematical theory of mode coupling induced by scattering in the thin layer, and derive a radiative transfer equation which quantifies the mean mode power exchange.We study the solution of this equation in the asymptotic limit of infinite width of the waveguide. The main result is a quantification of the rate of convergence of the mean mode powers toward equipartition.more » « less
-
Smart home cameras present new challenges for understanding behaviors and relationships surrounding always-on, domestic recording systems. We designed a series of discursive activities involving 16 individuals from ten households for six weeks in their everyday settings. These activities functioned as speculative probes prompting participants to reflect on themes of privacy and power through filming with cameras in their households. Our research design foregrounded critical-playful enactments that allowed participants to speculate potentials for relationships with cameras in the home beyond everyday use. We present four key dynamics with participants and home cameras by examining their relationships to: the camera’s eye, filming, their data, and camera’s societal contexts. We contribute discussions about the mundane, information privacy, and post-hoc reflection with one’s camera footage. Overall, our findings reveal the camera as a strange, yet banal entity in the home—interrogating how participants compose and handle their own and others’ video data.more » « less
An official website of the United States government

