The direction of arrival (DOA) of an acoustic source is a signal characteristic used by smart audio devices to enable signal enhancement algorithms. Though DOA estimations are traditionally made using a multi-microphone array, we propose that the resonant modes of a surface excited by acoustic waves contain sufficient spatial information that DOA may be estimated using a singular structural vibration sensor. In this work, sensors are affixed to an acrylic panel and used to record acoustic noise signals at various angles of incidence. From these recordings, feature vectors containing the sums of the energies in the panel’s isolated modal regions are extracted and used to train deep neural networks to estimate DOA. Experimental results show that when all 13 of the acrylic panel’s isolated modal bands are utilized, the DOA of incident acoustic waves for a broadband noise signal may be estimated by a single structural sensor to within ±5° with a reliability of 98.4%. The size of the feature set may be reduced by eliminating the resonant modes that do not have strong spatial coupling to the incident acoustic wave. Reducing the feature set to the 7 modal bands that provide the most spatial information produces a reliability of 89.7% for DOA estimates within ±5° using a single sensor.
more »
« less
Estimating direction of arrival in reverberant environments for wake-word detection using a single structural vibration sensor
The vibrational response of an elastic panel to incident acoustic waves is determined by the direction-of-arrival (DOA) of the waves relative to the spatial structure of the panel's bending modes. By monitoring the relative modal excitations of a panel immersed in a sound field, the DOA of the source may be inferred. In reverberant environments, early acoustic reflections and the late diffuse acoustic field may obscure the DOA of incoming sound waves. Panel microphones may be especially susceptible to the effects of reverberation due to their large surface areas and long-decaying impulse responses. An investigation into the effect of reverberation on the accuracy of DOA estimation with panel microphones was made by recording wake-word utterances in eight spaces with reverberation times (RT60s) ranging from 0.27 to 3.00 s. The responses were used to train neural networks to estimate the DOA. Within ±5°, DOA estimation reliability was measured at 95.00% in the least reverberant space, decreasing to 78.33% in the most reverberant space, suggesting an inverse relationship between RT60 and DOA accuracy. Experimental results suggest that a system for estimating DOA with panel microphones can generalize to new acoustic environments by cross-training the system with data from multiple spaces with different RT60s.
more »
« less
- Award ID(s):
- 2104758
- PAR ID:
- 10549302
- Publisher / Repository:
- AIP
- Date Published:
- Journal Name:
- The Journal of the Acoustical Society of America
- Volume:
- 156
- Issue:
- 4
- ISSN:
- 0001-4966
- Page Range / eLocation ID:
- 2619 to 2629
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Acoustic direction of arrival estimation methods allows positional information about sound sources to be transmitted over a network using minimal bandwidth. For these purposes,methods that prioritize low computational overhead and consistent accuracy under non-ideal conditions are preferred. The estimation method introduced in this paper uses a set of steered beams to estimate directional energy at sparsely distributed orientations around a spherical microphone array. By iteratively adjusting beam orientations based on the orientation of maximum energy, an accurate orientation estimate of a sound source may be produced with minimal computational cost. Incorporating conditions based on temporal smoothing and diffuse energy estimation further refines this process. Testing under simulated conditions indicates favorable accuracy under reverberation and source discrimination when compared with several other contemporary localization methods. Outcomes include an average localization error of less than 10◦ under 2 s of reverberation time (T60) and the potential to separate up to four sound sources under the same conditions. Results from testing in a laboratory environment demonstrate potential for integration into real-time frameworks.more » « less
-
We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura around the body. The acoustic signals are reflected based on the position and motion of various body parts, captured by the microphones, and analyzed by a customized self-supervised deep learning framework to infer the performed activities on a remote device such as a mobile phone or cloud server. ActSonic was evaluated in user studies with 19 participants across 19 households to track its efficacy in everyday activity recognition. Without requiring any training data from new users (leave-one-participant-out evaluation), ActSonic detected 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes.more » « less
-
Canlon Barbara (Ed.)The human auditory system can localize multiple sound sources using time, intensity, and frequency cues in the sound received by the two ears. Being able to spatially segregate the sources helps perception in a challenging condition when multiple sounds coexist. This study used model simulations to explore an algorithm for localizing multiple sources in azimuth with binaural (i.e., two) microphones. The algorithm relies on the “sparseness” property of daily signals in the time-frequency domain, and sound coming from different locations carrying unique spatial features will form clusters. Based on an interaural normalization procedure, the model generated spiral patterns for sound sources in the frontal hemifield. The model itself was created using broadband noise for better accuracy, because speech typically has sporadic energy at high frequencies. The model at an arbitrary frequency can be used to predict locations of speech and music that occurred alone or concurrently, and a classification algorithm was applied to measure the localization error. Under anechoic conditions, averaged errors in azimuth increased from 4.5° to 19° with RMS errors ranging from 6.4° to 26.7° as model frequency increased from 300 to 3000 Hz. The low-frequency model performance using short speech sound was notably better than the generalized cross-correlation model. Two types of room reverberations were then introduced to simulate difficult listening conditions. Model performance under reverberation was more resilient at low frequencies than at high frequencies. Overall, our study presented a spiral model for rapidly predicting horizontal locations of concurrent sound that is suitable for real-world scenarios.more » « less
-
We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS). To accommodate a wide variety of experiments, we implement an extensible architecture for wireless multi-robot coordination which enables synchronized robot motion for dynamic scenes with moving speakers and microphones. Using a virtual control interface, we can remotely design automated experiments to collect large-scale audio data. This data is shown to be similar across repeated runs, demonstrating the reliability of MARS. We discuss the potential for MARS to make audio data collection accessible for researchers without dedicated acoustic research spaces.more » « less
An official website of the United States government

