Human gaze estimation is a widely used technique to observe human behavior. The rapid adaptation of deep learning techniques in gaze estimation has evolved human gaze estimation to many application domains. The retail industry is one domain with challenging unconstrained environmental conditions such as eye occlusion and personal calibration. This study presents a novel gaze estimation model for single-user 2D gaze estimation in a retail environment. Our novel architecture, inspired by the previous work in gaze following, models the scene and head feature and further utilizes a shifted grids technique to accurately predict a saliency map. Our results show that the model can effectively infer 2D gaze in a retail environment. We achieve state-of-the-art performance on Gaze On Objects (GOO) dataset. The obtained results have shown 25.2° angular error for gaze estimation. Furthermore, we provide a detailed analysis of the GOO dataset and comprehensively analyze the selected model feature extractor to support our results.
more »
« less
A Calibration Framework for Photosensor-based Eye-Tracking System
The majority of eye-tracking systems require user-specific calibration to achieve suitable accuracy. Traditional calibration is performed by presenting targets at fixed locations that form a certain coverage of the device screen. If simple regression methods are used to learn a gaze map from the recorded data, the risk of overfitting is minimal. This is not the case if a gaze map is formed using neural networks, as is often employed in photosensor oculography (PSOG), which raises the question of careful design of calibration procedure. This paper evaluates different calibration data parsing approaches and the collection time-performance trade-off effect of grid density to build a calibration framework for PSOG with the use of video-based simulation framework.
more »
« less
- Award ID(s):
- 1714623
- PAR ID:
- 10157231
- Date Published:
- Journal Name:
- ACM Symposium on Eye Tracking Research and Applications
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.more » « less
-
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.more » « less
-
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a nite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.more » « less
-
A multiclass classifier is said to be top-label calibrated if the reported probability for the predicted class -- the top-label -- is calibrated, conditioned on the top-label. This conditioning on the top-label is absent in the closely related and popular notion of confidence calibration, which we argue makes confidence calibration difficult to interpret for decision-making. We propose top-label calibration as a rectification of confidence calibration. Further, we outline a multiclass-to-binary (M2B) reduction framework that unifies confidence, top-label, and class-wise calibration, among others. As its name suggests, M2B works by reducing multiclass calibration to numerous binary calibration problems, each of which can be solved using simple binary calibration routines. We instantiate the M2B framework with the well-studied histogram binning (HB) binary calibrator, and prove that the overall procedure is multiclass calibrated without making any assumptions on the underlying data distribution. In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we find that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling.more » « less
An official website of the United States government

