skip to main content


Title: Studies in spatial aural perception: establishing foundations for immersive sonification
The Spatial Audio Data Immersive Experience (SADIE) project aims to identify new foundational relationships pertaining to human spatial aural perception, and to validate existing relationships. Our infrastructure consists of an intuitive interaction interface, an immersive exocentric sonification environment, and a layer-based amplitude-panning algorithm. Here we highlight the systemメs unique capabilities and provide findings from an initial externally funded study that focuses on the assessment of human aural spatial perception capacity. When compared to the existing body of literature focusing on egocentric spatial perception, our data show that an immersive exocentric environment enhances spatial perception, and that the physical implementation using high density loudspeaker arrays enables significantly improved spatial perception accuracy relative to the egocentric and virtual binaural approaches. The preliminary observations suggest that human spatial aural perception capacity in real-world-like immersive exocentric environments that allow for head and body movement is significantly greater than in egocentric scenarios where head and body movement is restricted. Therefore, in the design of immersive auditory displays, the use of immersive exocentric environments is advised. Further, our data identify a significant gap between physical and virtual human spatial aural perception accuracy, which suggests that further development of virtual aural immersion may be necessary before such an approach may be seen as a viable alternative.  more » « less
Award ID(s):
1748667
NSF-PAR ID:
10108418
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Auditory Display
Page Range / eLocation ID:
28-35
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The expression of human emotion is integral to social interaction, and in virtual reality it is increasingly common to develop virtual avatars that attempt to convey emotions by mimicking these visual and aural cues, i.e. the facial and vocal expressions. However, errors in (or the absence of) facial tracking can result in the rendering of incorrect facial expressions on these virtual avatars. For example, a virtual avatar may speak with a happy or unhappy vocal inflection while their facial expression remains otherwise neutral. In circumstances where there is conflict between the avatar's facial and vocal expressions, it is possible that users will incorrectly interpret the avatar's emotion, which may have unintended consequences in terms of social influence or in terms of the outcome of the interaction. In this paper, we present a human-subjects study (N = 22) aimed at understanding the impact of conflicting facial and vocal emotional expressions. Specifically we explored three levels of emotional valence (unhappy, neutral, and happy) expressed in both visual (facial) and aural (vocal) forms. We also investigate three levels of head scales (down-scaled, accurate, and up-scaled) to evaluate whether head scale affects user interpretation of the conveyed emotion. We find significant effects of different multimodal expressions on happiness and trust perception, while no significant effect was observed for head scales. Evidence from our results suggest that facial expressions have a stronger impact than vocal expressions. Additionally, as the difference between the two expressions increase, the less predictable the multimodal expression becomes. For example, for the happy-looking and happy-sounding multimodal expression, we expect and see high happiness rating and high trust, however if one of the two expressions change, this mismatch makes the expression less predictable. We discuss the relationships, implications, and guidelines for social applications that aim to leverage multimodal social cues. 
    more » « less
  2. Background Sustained engagement is essential for the success of telerehabilitation programs. However, patients’ lack of motivation and adherence could undermine these goals. To overcome this challenge, physical exercises have often been gamified. Building on the advantages of serious games, we propose a citizen science–based approach in which patients perform scientific tasks by using interactive interfaces and help advance scientific causes of their choice. This approach capitalizes on human intellect and benevolence while promoting learning. To further enhance engagement, we propose performing citizen science activities in immersive media, such as virtual reality (VR). Objective This study aims to present a novel methodology to facilitate the remote identification and classification of human movements for the automatic assessment of motor performance in telerehabilitation. The data-driven approach is presented in the context of a citizen science software dedicated to bimanual training in VR. Specifically, users interact with the interface and make contributions to an environmental citizen science project while moving both arms in concert. Methods In all, 9 healthy individuals interacted with the citizen science software by using a commercial VR gaming device. The software included a calibration phase to evaluate the users’ range of motion along the 3 anatomical planes of motion and to adapt the sensitivity of the software’s response to their movements. During calibration, the time series of the users’ movements were recorded by the sensors embedded in the device. We performed principal component analysis to identify salient features of movements and then applied a bagged trees ensemble classifier to classify the movements. Results The classification achieved high performance, reaching 99.9% accuracy. Among the movements, elbow flexion was the most accurately classified movement (99.2%), and horizontal shoulder abduction to the right side of the body was the most misclassified movement (98.8%). Conclusions Coordinated bimanual movements in VR can be classified with high accuracy. Our findings lay the foundation for the development of motion analysis algorithms in VR-mediated telerehabilitation. 
    more » « less
  3. Stationarity perception refers to the ability to accurately perceive the surrounding visual environment as world-fixed during self-motion. Perception of stationarity depends on mechanisms that evaluate the congruence between retinal/oculomotor signals and head movement signals. In a series of psychophysical experiments, we systematically varied the congruence between retinal/oculomotor and head movement signals to find the range of visual gains that is compatible with perception of a stationary environment. On each trial, human subjects wearing a head-mounted display execute a yaw head movement and report whether the visual gain was perceived to be too slow or fast. A psychometric fit to the data across trials reveals the visual gain most compatible with stationarity (a measure of accuracy) and the sensitivity to visual gain manipulation (a measure of precision). Across experiments, we varied 1) the spatial frequency of the visual stimulus, 2) the retinal location of the visual stimulus (central vs. peripheral), and 3) fixation behavior (scene-fixed vs. head-fixed). Stationarity perception is most precise and accurate during scene-fixed fixation. Effects of spatial frequency and retinal stimulus location become evident during head-fixed fixation, when retinal image motion is increased. Virtual Reality sickness assessed using the Simulator Sickness Questionnaire covaries with perceptual performance. Decreased accuracy is associated with an increase in the nausea subscore, while decreased precision is associated with an increase in the oculomotor and disorientation subscores. 
    more » « less
  4. While tremendous advances in visual and auditory realism have been made for virtual and augmented reality (VR/AR), introducing a plausible sense of physicality into the virtual world remains challenging. Closing the gap between real-world physicality and immersive virtual experience requires a closed interaction loop: applying user-exerted physical forces to the virtual environment and generating haptic sensations back to the users. However, existing VR/AR solutions either completely ignore the force inputs from the users or rely on obtrusive sensing devices that compromise user experience. By identifying users' muscle activation patterns while engaging in VR/AR, we design a learning-based neural interface for natural and intuitive force inputs. Specifically, we show that lightweight electromyography sensors, resting non-invasively on users' forearm skin, inform and establish a robust understanding of their complex hand activities. Fuelled by a neural-network-based model, our interface can decode finger-wise forces in real-time with 3.3% mean error, and generalize to new users with little calibration. Through an interactive psychophysical study, we show that human perception of virtual objects' physical properties, such as stiffness, can be significantly enhanced by our interface. We further demonstrate that our interface enables ubiquitous control via finger tapping. Ultimately, we envision our findings to push forward research towards more realistic physicality in future VR/AR. 
    more » « less
  5. Gonzalez, D. (Ed.)

    Today’s research on human-robot teaming requires the ability to test artificial intelligence (AI) algorithms for perception and decision-making in complex real-world environments. Field experiments, also referred to as experiments “in the wild,” do not provide the level of detailed ground truth necessary for thorough performance comparisons and validation. Experiments on pre-recorded real-world data sets are also significantly limited in their usefulness because they do not allow researchers to test the effectiveness of active robot perception and control or decision strategies in the loop. Additionally, research on large human-robot teams requires tests and experiments that are too costly even for the industry and may result in considerable time losses when experiments go awry. The novel Real-Time Human Autonomous Systems Collaborations (RealTHASC) facility at Cornell University interfaces real and virtual robots and humans with photorealistic simulated environments by implementing new concepts for the seamless integration of wearable sensors, motion capture, physics-based simulations, robot hardware and virtual reality (VR). The result is an extended reality (XR) testbed by which real robots and humans in the laboratory are able to experience virtual worlds, inclusive of virtual agents, through real-time visual feedback and interaction. VR body tracking by DeepMotion is employed in conjunction with the OptiTrack motion capture system to transfer every human subject and robot in the real physical laboratory space into a synthetic virtual environment, thereby constructing corresponding human/robot avatars that not only mimic the behaviors of the real agents but also experience the virtual world through virtual sensors and transmit the sensor data back to the real human/robot agent, all in real time. New cross-domain synthetic environments are created in RealTHASC using Unreal Engine™, bridging the simulation-to-reality gap and allowing for the inclusion of underwater/ground/aerial autonomous vehicles, each equipped with a multi-modal sensor suite. The experimental capabilities offered by RealTHASC are demonstrated through three case studies showcasing mixed real/virtual human/robot interactions in diverse domains, leveraging and complementing the benefits of experimentation in simulation and in the real world.

     
    more » « less