Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading, i.e., watching the movements of a speaker’s mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study (N = 21) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution (1832×1920 or 916×960 pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants’ speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants’ ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.more » « less
-
Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading , i.e., watching the movements of a speaker's mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study ( $N=21$ ) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution ( $1832\times 1920$ or $916\times 960$ pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants' speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants' ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.more » « less
-
With innovations in the field of gaze and eye tracking, a new concentration of research in the area of gaze-tracked systems and user interfaces has formed in the field of Extended Reality (XR). Eye trackers are being used to explore novel forms of spatial human–computer interaction, to understand human attention and behavior, and to test expectations and human responses. In this article, we review gaze interaction and eye tracking research related to XR that has been published since 1985, which includes a total of 215 publications. We outline efforts to apply eye gaze for direct interaction with virtual content and design of attentive interfaces that adapt the presented content based on eye gaze behavior and discuss how eye gaze has been utilized to improve collaboration in XR. We outline trends and novel directions and discuss representative high-impact papers in detail.more » « less
-
The expression of human emotion is integral to social interaction, and in virtual reality it is increasingly common to develop virtual avatars that attempt to convey emotions by mimicking these visual and aural cues, i.e. the facial and vocal expressions. However, errors in (or the absence of) facial tracking can result in the rendering of incorrect facial expressions on these virtual avatars. For example, a virtual avatar may speak with a happy or unhappy vocal inflection while their facial expression remains otherwise neutral. In circumstances where there is conflict between the avatar's facial and vocal expressions, it is possible that users will incorrectly interpret the avatar's emotion, which may have unintended consequences in terms of social influence or in terms of the outcome of the interaction. In this paper, we present a human-subjects study (N = 22) aimed at understanding the impact of conflicting facial and vocal emotional expressions. Specifically we explored three levels of emotional valence (unhappy, neutral, and happy) expressed in both visual (facial) and aural (vocal) forms. We also investigate three levels of head scales (down-scaled, accurate, and up-scaled) to evaluate whether head scale affects user interpretation of the conveyed emotion. We find significant effects of different multimodal expressions on happiness and trust perception, while no significant effect was observed for head scales. Evidence from our results suggest that facial expressions have a stronger impact than vocal expressions. Additionally, as the difference between the two expressions increase, the less predictable the multimodal expression becomes. For example, for the happy-looking and happy-sounding multimodal expression, we expect and see high happiness rating and high trust, however if one of the two expressions change, this mismatch makes the expression less predictable. We discuss the relationships, implications, and guidelines for social applications that aim to leverage multimodal social cues.more » « less
-
Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues.more » « less
-
Effects of Environmental Noise Levels on Patient Handoff Communication in a Mixed Reality SimulationWhen medical caregivers transfer patients to another person’s care (a patient handoff), it is essential they effectively communicate the patient’s condition to ensure the best possible health outcomes. Emergency situations caused by mass casualty events (e.g., natural disasters) introduce additional difficulties to handoff procedures such as environmental noise. We created a projected mixed reality simulation of a handoff scenario involving a medical evacuation by air and tested how low, medium, and high levels of helicopter noise affected participants’ handoff experience, handoff performance, and behaviors. Through a human-subjects experimental design study (N = 21), we found that the addition of noise increased participants’ subjective stress and task load, decreased their self-assessed and actual performance, and caused participants to speak louder. Participants also stood closer to the virtual human sending the handoff information when listening to the handoff than they stood to the receiver when relaying the handoff information. We discuss implications for the design of handoff training simulations and avenues for future handoff communication research.more » « less
-
Display technologies in the fields of virtual and augmented reality affect the appearance of human representations, such as avatars used in telepresence or entertainment applications, based on the user’s current viewing conditions. With changing viewing conditions, it is possible that the perceived appearance of one’s avatar changes in an unexpected or undesired manner, which may change user behavior towards these avatars and cause frustration in using the AR display. In this paper, we describe a user study (N=20) where participants saw themselves in a mirror standing next to their own avatar through use of a HoloLens 2 optical see-through head-mounted display. Participants were tasked to match their avatar’s appearance to their own under two environment lighting conditions (200 lux and 2,000 lux). Our results showed that the intensity of environment lighting had a significant effect on participants selected skin colors for their avatars, where participants with dark skin colors tended to make their avatar’s skin color lighter, nearly to the level of participants with light skin color. Further, in particular female participants made their avatar’s hair color darker for the lighter environment lighting condition. We discuss our results with a view on technological limitations and effects on the diversity of avatar representations on optical see-through displays.more » « less
-
Due to the additive light model employed by current optical see-through head-mounted displays (OST-HMDs), the perceived contrast of displayed imagery is reduced with increased environment luminance, often to the point where it becomes difficult for the user to accurately distinguish the presence of visual imagery. While existing contrast models, such as Weber contrast and Michelson contrast, can be used to predict when the observer will experience difficulty distinguishing and interpreting stimuli on traditional dis-plays, these models must be adapted for use with additive displays. In this paper, we present a simplified model of luminance contrast for optical see-through displays derived from Michelson's contrast equation and demonstrate two applications of the model: informing design decisions involving the color of virtual imagery and optimizing environment light attenuation through the use of neutral density filters.more » « less
-
In this paper, we report on initial work exploring the potential value of technology-mediated cues and signals to improve cross-reality interruptions. We investigated the use of color-coded visual cues (LED lights) to help a person decide when to interrupt a virtual reality (VR) user, and a gesture-based mechanism (waving at the user) to signal their desire to do so. To assess the potential value of these mechanisms we conducted a preliminary 2×3 within-subjects experimental design user study (N=10) where the participants acted in the role of the interrupter. While we found that our visual cues improved participants' experiences, our gesture-based signaling mechanism did not, as users did not trust it nor consider it as intuitive as a speech-based mechanism might be. Our preliminary findings motivate further investigation of interruption cues and signaling mechanisms to inform future VR head-worn display system designs.more » « less