Human gaze estimation is a widely used technique to observe human behavior. The rapid adaptation of deep learning techniques in gaze estimation has evolved human gaze estimation to many application domains. The retail industry is one domain with challenging unconstrained environmental conditions such as eye occlusion and personal calibration. This study presents a novel gaze estimation model for single-user 2D gaze estimation in a retail environment. Our novel architecture, inspired by the previous work in gaze following, models the scene and head feature and further utilizes a shifted grids technique to accurately predict a saliency map. Our results show that the model can effectively infer 2D gaze in a retail environment. We achieve state-of-the-art performance on Gaze On Objects (GOO) dataset. The obtained results have shown 25.2° angular error for gaze estimation. Furthermore, we provide a detailed analysis of the GOO dataset and comprehensively analyze the selected model feature extractor to support our results.
more »
« less
A Calibration Framework for Photosensor-based Eye-Tracking System
The majority of eye-tracking systems require user-specific calibration to achieve suitable accuracy. Traditional calibration is performed by presenting targets at fixed locations that form a certain coverage of the device screen. If simple regression methods are used to learn a gaze map from the recorded data, the risk of overfitting is minimal. This is not the case if a gaze map is formed using neural networks, as is often employed in photosensor oculography (PSOG), which raises the question of careful design of calibration procedure. This paper evaluates different calibration data parsing approaches and the collection time-performance trade-off effect of grid density to build a calibration framework for PSOG with the use of video-based simulation framework.
more »
« less
- Award ID(s):
- 1714623
- PAR ID:
- 10157231
- Date Published:
- Journal Name:
- ACM Symposium on Eye Tracking Research and Applications
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Accurate eye tracking is crucial for gaze-dependent research, but calibrating eye trackers in subjects who cannot follow instructions, such as human infants and nonhuman primates, presents a challenge. Traditional calibration methods rely on verbal instructions, which are ineffective for these populations. To address this, researchers often use attention-grabbing stimuli in known locations; however, existing software for video-based calibration is often proprietary and inflexible. We introduce an extension to the open-source toolbox Titta—a software package integrating desktop Tobii eye trackers with PsychToolbox experiments—to facilitate custom video-based calibration. This toolbox extension offers a flexible platform for attracting attention, calibrating using flexible point selection, and validating the calibration. The toolbox has been refined through extensive use with chimpanzees, baboons, and macaques, demonstrating its effectiveness across species. Our adaptive calibration and validation procedures provide a standardized method for achieving more accurate gaze tracking, enhancing gaze accuracy across diverse species.more » « less
-
Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.more » « less
-
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.more » « less
-
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a nite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.more » « less
An official website of the United States government

