The following paper presents a cross-disciplinary snapshot of 21st century research in sonification and leverages the review to identify a new immersive exocentric approach to studying human capacity to perceive spatial aural cues. The paper further defines immersive exocentric sonification, highlights its unique affordances, and presents an argument for its potential to fundamentally change the way we understand and study the human capacity for location-aware audio pattern recognition. Finally, the paper describes an example of an externally funded research project that aims to tackle this newfound research whitespace.
more »
« less
Studies in spatial aural perception: establishing foundations for immersive sonification
The Spatial Audio Data Immersive Experience (SADIE) project aims to identify new foundational relationships pertaining to human spatial aural perception, and to validate existing relationships. Our infrastructure consists of an intuitive interaction interface, an immersive exocentric sonification environment, and a layer-based amplitude-panning algorithm. Here we highlight the systemメs unique capabilities and provide findings from an initial externally funded study that focuses on the assessment of human aural spatial perception capacity. When compared to the existing body of literature focusing on egocentric spatial perception, our data show that an immersive exocentric environment enhances spatial perception, and that the physical implementation using high density loudspeaker arrays enables significantly improved spatial perception accuracy relative to the egocentric and virtual binaural approaches. The preliminary observations suggest that human spatial aural perception capacity in real-world-like immersive exocentric environments that allow for head and body movement is significantly greater than in egocentric scenarios where head and body movement is restricted. Therefore, in the design of immersive auditory displays, the use of immersive exocentric environments is advised. Further, our data identify a significant gap between physical and virtual human spatial aural perception accuracy, which suggests that further development of virtual aural immersion may be necessary before such an approach may be seen as a viable alternative.
more »
« less
- Award ID(s):
- 1748667
- PAR ID:
- 10108418
- Date Published:
- Journal Name:
- International Conference on Auditory Display
- Page Range / eLocation ID:
- 28-35
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Stationarity perception refers to the ability to accurately perceive the surrounding visual environment as world-fixed during self-motion. Perception of stationarity depends on mechanisms that evaluate the congruence between retinal/oculomotor signals and head movement signals. In a series of psychophysical experiments, we systematically varied the congruence between retinal/oculomotor and head movement signals to find the range of visual gains that is compatible with perception of a stationary environment. On each trial, human subjects wearing a head-mounted display execute a yaw head movement and report whether the visual gain was perceived to be too slow or fast. A psychometric fit to the data across trials reveals the visual gain most compatible with stationarity (a measure of accuracy) and the sensitivity to visual gain manipulation (a measure of precision). Across experiments, we varied 1) the spatial frequency of the visual stimulus, 2) the retinal location of the visual stimulus (central vs. peripheral), and 3) fixation behavior (scene-fixed vs. head-fixed). Stationarity perception is most precise and accurate during scene-fixed fixation. Effects of spatial frequency and retinal stimulus location become evident during head-fixed fixation, when retinal image motion is increased. Virtual Reality sickness assessed using the Simulator Sickness Questionnaire covaries with perceptual performance. Decreased accuracy is associated with an increase in the nausea subscore, while decreased precision is associated with an increase in the oculomotor and disorientation subscores.more » « less
-
null (Ed.)The robotics community continually strives to create robots that are deployable in real-world environments. Often, robots are expected to interact with human groups. To achieve this goal, we introduce a new method, the Robot-Centric Group Estimation Model (RoboGEM), which enables robots to detect groups of people. Much of the work reported in the literature focuses on dyadic interactions, leaving a gap in our understanding of how to build robots that can effectively team with larger groups of people. Moreover, many current methods rely on exocentric vision, where cameras and sensors are placed externally in the environment, rather than onboard the robot. Consequently, these methods are impractical for robots in unstructured, human-centric environments, which are novel and unpredictable. Furthermore, the majority of work on group perception is supervised, which can inhibit performance in real-world settings. RoboGEM addresses these gaps by being able to predict social groups solely from an egocentric perspective using color and depth (RGB-D) data. To achieve group predictions, RoboGEM leverages joint motion and proximity estimations. We evaluated RoboGEM against a challenging, egocentric, real-world dataset where both pedestrians and the robot are in motion simultaneously, and show RoboGEM outperformed two state-of-the-art supervised methods in detection accuracy by up to 30%, with a lower miss rate. Our work will be helpful to the robotics community, and serve as a milestone to building unsupervised systems that will enable robots to work with human groups in real-world environments.more » « less
-
null (Ed.)Egocentric perception has grown rapidly with the advent of immersive computing devices. Human gaze prediction is an important problem in analyzing egocentric videos and has primarily been tackled through either saliency-based modeling or highly supervised learning. We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task on unseen, out-of-domain data. We find that their performance is highly dependent on the training data and is restricted to the domains specified in the training annotations. In this work, we tackle the problem of jointly predicting human gaze points and temporal segmentation of egocentric videos without using any training data. We introduce an unsupervised computational model that draws inspiration from cognitive psychology models of event perception. We use Grenander's pattern theory formalism to represent spatial-temporal features and model surprise as a mechanism to predict gaze fixation points. Extensive evaluation on two publicly available datasets - GTEA and GTEA+ datasets-shows that the proposed model can significantly outperform all unsupervised baselines and some supervised gaze prediction baselines. Finally, we show that the model can also temporally segment egocentric videos with a performance comparable to more complex, fully supervised deep learning baselines.more » « less
-
Modern developments in autonomous chemometric machine learning technology strive to relinquish the need for human intervention. However, such algorithms developed and used in chemometric multivariate calibration and classification applications exclude crucial expert insight when difficult and safety-critical analysis situations arise, e.g., spectral-based medical decisions such as noninvasively determining if a biopsy is cancerous. The prediction accuracy and interpolation capabilities of autonomous methods for new samples depend on the quality and scope of their training (calibration) data. Specifically, analysis patterns within target data not captured by the training data will produce undesirable outcomes. Alternatively, using an immersive analytic approach allows insertion of human expert judgment at key machine learning algorithm junctures forming a sensemaking process performed in cooperation with a computer. The capacity of immersive virtual reality (IVR) environments to render human comprehensible three-dimensional space simulating real-world encounters, suggests its suitability as a hybrid immersive human–computer interface for data analysis tasks. Using IVR maximizes human senses to capitalize on our instinctual perception of the physical environment, thereby leveraging our innate ability to recognize patterns and visualize thresholds crucial to reducing erroneous outcomes. In this first use of IVR as an immersive analytic tool for spectral data, we examine an integrated IVR real-time model selection algorithm for a recent model updating method that adapts a model from the original calibration domain to predict samples from shifted target domains. Using near-infrared data, analyte prediction errors from IVR-selected models are reduced compared to errors using an established autonomous model selection approach. Results demonstrate the viability of IVR as a human data analysis interface for spectral data analysis including classification problems.more » « less