skip to main content

This content will become publicly available on April 30, 2024

Title: The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-worn Extended Reality
With innovations in the field of gaze and eye tracking, a new concentration of research in the area of gaze-tracked systems and user interfaces has formed in the field of Extended Reality (XR). Eye trackers are being used to explore novel forms of spatial human–computer interaction, to understand human attention and behavior, and to test expectations and human responses. In this article, we review gaze interaction and eye tracking research related to XR that has been published since 1985, which includes a total of 215 publications. We outline efforts to apply eye gaze for direct interaction with virtual content and design of attentive interfaces that adapt the presented content based on eye gaze behavior and discuss how eye gaze has been utilized to improve collaboration in XR. We outline trends and novel directions and discuss representative high-impact papers in detail.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
ACM Computing Surveys
Page Range / eLocation ID:
1 to 39
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues. 
    more » « less
  2. There is a lack of datasets for visual-inertial odometry applications in Extended Reality (XR). To the best of our knowledge, there is no dataset available that is captured from an XR headset with a human as a carrier. To bridge this gap, we present a novel pose estimation dataset --- called HoloSet --- collected using Microsoft Hololens 2, which is a state-of-the-art head mounted device for XR. Potential applications for HoloSet include visual-inertial odometry, simultaneous localization and mapping (SLAM), and additional applications in XR that leverage visual-inertial data. HoloSet captures both macro and micro movements. For macro movements, the dataset consists of more than 66,000 samples of visual, inertial, and depth camera data in a variety of environments (indoor, outdoor) and scene setups (trails, suburbs, downtown) under multiple user action scenarios (walk, jog). For micro movements, the dataset consists of more than 12,000 samples of additional articulated hand depth camera images while a user plays games that exercise fine motor skills and hand-eye coordination. We present basic visualizations and high-level statistics of the data and outline the potential research use cases for HoloSet. 
    more » « less
  3. Blascheck, Tanja ; Bradshaw, Jessica ; Vrzakova, Hana (Ed.)
    Virtual Reality (VR) technology has advanced to include eye-tracking, allowing novel research, such as investigating how our visual system coordinates eye movements with changes in perceptual depth. The purpose of this study was to examine whether eye tracking could track perceptual depth changes during a visual discrimination task. We derived two depth-dependent variables from eye tracker data: eye vergence angle (EVA) and interpupillary distance (IPD). As hypothesized, our results revealed that shifting gaze from near-to-far depth significantly decreased EVA and increased IPD, while the opposite pattern was observed while shifting from far-to-near. Importantly, the amount of change in these variables tracked closely with relative changes in perceptual depth, and supported the hypothesis that eye tracker data may be used to infer real-time changes in perceptual depth in VR. Our method could be used as a new tool to adaptively render information based on depth and improve the VR user experience. 
    more » « less
  4. Eye tracking has become an essential human-machine interaction modality for providing immersive experience in numerous virtual and augmented reality (VR/AR) applications desiring high throughput (e.g., 240 FPS), small-form, and enhanced visual privacy. However, existing eye tracking systems are still limited by their: (1) large form-factor largely due to the adopted bulky lens-based cameras; (2) high communication cost required between the camera and backend processor; and (3) potentially concerned low visual privacy, thus prohibiting their more extensive applications. To this end, we propose, develop, and validate a lensless FlatCambased eye tracking algorithm and accelerator co-design framework dubbed EyeCoD to enable eye tracking systems with a much reduced form-factor and boosted system efficiency without sacrificing the tracking accuracy, paving the way for next-generation eye tracking solutions. On the system level, we advocate the use of lensless FlatCams instead of lens-based cameras to facilitate the small form-factor need in mobile eye tracking systems, which also leaves rooms for a dedicated sensing-processor co-design to reduce the required camera-processor communication latency. On the algorithm level, EyeCoD integrates a predict-then-focus pipeline that first predicts the region-of-interest (ROI) via segmentation and then only focuses on the ROI parts to estimate gaze directions, greatly reducing redundant computations and data movements. On the hardware level, we further develop a dedicated accelerator that (1) integrates a novel workload orchestration between the aforementioned segmentation and gaze estimation models, (2) leverages intra-channel reuse opportunities for depth-wise layers, (3) utilizes input feature-wise partition to save activation memory size, and (4) develops a sequential-write-parallel-read input buffer to alleviate the bandwidth requirement for the activation global buffer. On-silicon measurement and extensive experiments validate that our EyeCoD consistently reduces both the communication and computation costs, leading to an overall system speedup of 10.95×, 3.21×, and 12.85× over general computing platforms including CPUs and GPUs, and a prior-art eye tracking processor called CIS-GEP, respectively, while maintaining the tracking accuracy. Codes are available at 
    more » « less
  5. Emerging Virtual Reality (VR) displays with embedded eye trackers are currently becoming a commodity hardware (e.g., HTC Vive Pro Eye). Eye-tracking data can be utilized for several purposes, including gaze monitoring, privacy protection, and user authentication/identification. Identifying users is an integral part of many applications due to security and privacy concerns. In this paper, we explore methods and eye-tracking features that can be used to identify users. Prior VR researchers explored machine learning on motion-based data (such as body motion, head tracking, eye tracking, and hand tracking data) to identify users. Such systems usually require an explicit VR task and many features to train the machine learning model for user identification. We propose a system to identify users utilizing minimal eye-gaze-based features without designing any identification-specific tasks. We collected gaze data from an educational VR application and tested our system with two machine learning (ML) models, random forest (RF) and k-nearest-neighbors (kNN), and two deep learning (DL) models: convolutional neural networks (CNN) and long short-term memory (LSTM). Our results show that ML and DL models could identify users with over 98% accuracy with only six simple eye-gaze features. We discuss our results, their implications on security and privacy, and the limitations of our work. 
    more » « less