skip to main content


Title: Effects of Embodiment on Generic and Content-Specific Intelligent Virtual Agents as Exhibition Guides
Intelligent Virtual Agents (IVAs) received enormous attention in recent years due to significant improvements in voice communication technologies and the convergence of different research fields such as Machine Learning, Internet of Things, and Virtual Reality (VR). Interactive conversational IVAs can appear in different forms such as voice-only or with embodied audio-visual representations showing, for example, human-like contextually related or generic three-dimensional bodies. In this paper, we analyzed the benefits of different forms of virtual agents in the context of a VR exhibition space. Our results suggest positive evidence showing large benefits of both embodied and thematically related audio-visual representations of IVAs. We discuss implications and suggestions for content developers to design believable virtual agents in the context of such installations.  more » « less
Award ID(s):
1800961
NSF-PAR ID:
10106968
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Artificial Reality and Telexistence and Eurographics Symposium on Virtual Environments (ICAT-EGVE 2018), Limassol, Cyprus, November 7–9, 2018
Page Range / eLocation ID:
13-20
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues. 
    more » « less
  2. null (Ed.)
    Technological advancements and increased access have prompted the adoption of head- mounted display based virtual reality (VR) for neuroscientific research, manual skill training, and neurological rehabilitation. Applications that focus on manual interaction within the virtual environment (VE), especially haptic-free VR, critically depend on virtual hand-object collision detection. Knowledge about how multisensory integration related to hand-object collisions affects perception-action dynamics and reach-to-grasp coordination is needed to enhance the immersiveness of interactive VR. Here, we explored whether and to what extent sensory substitution for haptic feedback of hand-object collision (visual, audio, or audiovisual) and collider size (size of spherical pointers representing the fingertips) influences reach-to-grasp kinematics. In Study 1, visual, auditory, or combined feedback were compared as sensory substitutes to indicate the successful grasp of a virtual object during reach-to-grasp actions. In Study 2, participants reached to grasp virtual objects using spherical colliders of different diameters to test if virtual collider size impacts reach-to-grasp. Our data indicate that collider size but not sensory feedback modality significantly affected the kinematics of grasping. Larger colliders led to a smaller size-normalized peak aperture. We discuss this finding in the context of a possible influence of spherical collider size on the perception of the virtual object’s size and hence effects on motor planning of reach-to-grasp. Critically, reach-to-grasp spatiotemporal coordination patterns were robust to manipulations of sensory feedback modality and spherical collider size, suggesting that the nervous system adjusted the reach (transport) component commensurately to the changes in the grasp (aperture) component. These results have important implications for research, commercial, industrial, and clinical applications of VR. 
    more » « less
  3. As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene both visually and aurally. Acuity isolates subjects from high-resolution input point clouds. It reduces the processing overhead by performing background subtraction at a coarse resolution, then applying the detected bounding boxes to fine-grained point clouds. Meanwhile, Acuity leverages an audiovisual sensor fusion approach to expedite sound source separation. The estimated object location in the visual domain guides the acoustic pipeline to isolate the subjects’ voices without running sound source localization. Our results demonstrate that Acuity can isolate multiple subjects’ high-quality point clouds with a maximum latency of 70 ms and average throughput of over 25 fps, while separating audio in less than 30 ms. We provide the source code of Acuity at: https://github.com/nesl/Acuity. 
    more » « less
  4. Voice-activated Intelligent Virtual Assistants (IVAs) such as Amazon Alexa offer a natural and realistic form of interaction that pursues the level of social interaction among real humans. The user experience with such technologies depends to a large degree on the perceived trust in and reliability of the IVA. In this poster, we explore the effects of a three-dimensional embodied representation of Amazon Alexa in Augmented Reality (AR) on the user’s perceived trust in her being able to control Internet of Things (IoT) devices in a smart home environment. We present a preliminary study and discuss the potential of positive effects in perceived trust due to the embodied representation compared to a voice-only condition. 
    more » « less
  5. Conventional Intelligent Virtual Agents (IVAs) focus primarily on the visual and auditory channels for both the agent and the interacting human: the agent displays a visual appearance and speech as output, while processing the human’s verbal and non-verbal behavior as input. However, some interactions, particularly those between a patient and healthcare provider, inherently include tactile components.We introduce an Intelligent Physical-Virtual Agent (IPVA) head that occupies an appropriate physical volume; can be touched; and via human-in-the-loop control can change appearance, listen, speak, and react physiologically in response to human behavior. Compared to a traditional IVA, it provides a physical affordance, allowing for more realistic and compelling human-agent interactions. In a user study focusing on neurological assessment of a simulated patient showing stroke symptoms, we compared the IPVA head with a high-fidelity touch-aware mannequin that has a static appearance. Various measures of the human subjects indicated greater attention, affinity for, and presence with the IPVA patient, all factors that can improve healthcare training. 
    more » « less