skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Retail Gaze: A Dataset for Gaze Estimation in Retail Environments
The concept of gaze object estimation predicts a bounding box that a person looks steadily. It is a applicable and contemporary technique in the retail industry. However, the existing datasets for gaze object prediction in retail is limited to controlled environments and do not consider retail product category area segmentation annotations. This paper proposes Retail Gaze, a dataset for gaze estimation in real-world retail environments. Retail Gaze is composed of 3,922 images of individuals looking at products in a retail environment, with 12 camera capture angles. Furthermore, we use state-of-the-art gaze estimation models to benchmark the Retail Gaze dataset and comprehensively analyze the results obtained.  more » « less
Award ID(s):
2045523
PAR ID:
10403595
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 International Conference on Decision Aid Sciences and Applications (DASA)
Page Range / eLocation ID:
1040 to 1044
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human gaze estimation is a widely used technique to observe human behavior. The rapid adaptation of deep learning techniques in gaze estimation has evolved human gaze estimation to many application domains. The retail industry is one domain with challenging unconstrained environmental conditions such as eye occlusion and personal calibration. This study presents a novel gaze estimation model for single-user 2D gaze estimation in a retail environment. Our novel architecture, inspired by the previous work in gaze following, models the scene and head feature and further utilizes a shifted grids technique to accurately predict a saliency map. Our results show that the model can effectively infer 2D gaze in a retail environment. We achieve state-of-the-art performance on Gaze On Objects (GOO) dataset. The obtained results have shown 25.2° angular error for gaze estimation. Furthermore, we provide a detailed analysis of the GOO dataset and comprehensively analyze the selected model feature extractor to support our results. 
    more » « less
  2. Abstract As technology advances, Human-Robot Interaction (HRI) is boosting overall system efficiency and productivity. However, allowing robots to be present closely with humans will inevitably put higher demands on precise human motion tracking and prediction. Datasets that contain both humans and robots operating in the shared space are receiving growing attention as they may facilitate a variety of robotics and human-systems research. Datasets that track HRI with rich information other than video images during daily activities are rarely seen. In this paper, we introduce a novel dataset that focuses on social navigation between humans and robots in a future-oriented Wholesale and Retail Trade (WRT) environment (https://uf-retail-cobot-dataset.github.io/). Eight participants performed the tasks that are commonly undertaken by consumers and retail workers. More than 260 minutes of data were collected, including robot and human trajectories, human full-body motion capture, eye gaze directions, and other contextual information. Comprehensive descriptions of each category of data stream, as well as potential use cases are included. Furthermore, analysis with multiple data sources and future directions are discussed. 
    more » « less
  3. Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection. 
    more » « less
  4. null; null; null; null; null; null; null; null (Ed.)
    Effective assisted living environments must be able to perform inferences on how their occupants interact with their environment. Gaze direction provides strong indications of how people interact with their surroundings. In this paper, we propose a gaze tracking method that uses a neural network regressor to estimate gazes from keypoints and integrates them over time using a moving average mechanism. Our gaze regression model uses confidence gated units to handle cases of keypoint occlusion and estimate its own prediction uncertainty. Our temporal approach for gaze tracking incorporates these prediction uncertainties as weights in the moving average scheme. Experimental results on a dataset collected in an assisted living facility demonstrate that our gaze regression network performs on par with a complex, dataset-specific baseline, while its uncertainty predictions are highly correlated with the actual angular error of corresponding estimations. Finally, experiments on videos sequences show that our temporal approach generates more accurate and stable gaze predictions. 
    more » « less
  5. Intelligent driving assistance can alert drivers to objects in their environment; however, such systems require a model of drivers' situational awareness (SA) (what aspects of the scene they are already aware of) to avoid unnecessary alerts. Moreover, collecting the data to train such an SA model is challenging: being an internal human cognitive state, driver SA is difficult to measure, and non-verbal signals such as eye gaze are some of the only outward manifestations of it. Traditional methods to obtain SA labels rely on probes that result in sparse, intermittent SA labels unsuitable for modeling a dense, temporally correlated process via machine learning. We propose a novel interactive labeling protocol that captures dense, continuous SA labels and use it to collect an object-level SA dataset in a VR driving simulator. Our dataset comprises 20 unique drivers' SA labels, driving data, and gaze (over 320 minutes of driving) which will be made public. Additionally, we train an SA model from this data, formulating the object-level driver SA prediction problem as a semantic segmentation problem. Our formulation allows all objects in a scene at a timestep to be processed simultaneously, leveraging global scene context and local gaze-object relationships together. Our experiments show that this formulation leads to improved performance over common sense baselines and prior art on the SA prediction task. 
    more » « less