The concept of gaze object estimation predicts a bounding box that a person looks steadily. It is a applicable and contemporary technique in the retail industry. However, the existing datasets for gaze object prediction in retail is limited to controlled environments and do not consider retail product category area segmentation annotations. This paper proposes Retail Gaze, a dataset for gaze estimation in real-world retail environments. Retail Gaze is composed of 3,922 images of individuals looking at products in a retail environment, with 12 camera capture angles. Furthermore, we use state-of-the-art gaze estimation models to benchmark the Retail Gaze dataset and comprehensively analyze the results obtained.
more »
« less
Single-User 2D Gaze Estimation in Retail Environment Using Deep Learning
Human gaze estimation is a widely used technique to observe human behavior. The rapid adaptation of deep learning techniques in gaze estimation has evolved human gaze estimation to many application domains. The retail industry is one domain with challenging unconstrained environmental conditions such as eye occlusion and personal calibration. This study presents a novel gaze estimation model for single-user 2D gaze estimation in a retail environment. Our novel architecture, inspired by the previous work in gaze following, models the scene and head feature and further utilizes a shifted grids technique to accurately predict a saliency map. Our results show that the model can effectively infer 2D gaze in a retail environment. We achieve state-of-the-art performance on Gaze On Objects (GOO) dataset. The obtained results have shown 25.2° angular error for gaze estimation. Furthermore, we provide a detailed analysis of the GOO dataset and comprehensively analyze the selected model feature extractor to support our results.
more »
« less
- Award ID(s):
- 2045523
- PAR ID:
- 10403597
- Date Published:
- Journal Name:
- 2022 2nd International Conference on Advanced Research in Computing (ICARC)
- Page Range / eLocation ID:
- 206 to 211
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract As technology advances, Human-Robot Interaction (HRI) is boosting overall system efficiency and productivity. However, allowing robots to be present closely with humans will inevitably put higher demands on precise human motion tracking and prediction. Datasets that contain both humans and robots operating in the shared space are receiving growing attention as they may facilitate a variety of robotics and human-systems research. Datasets that track HRI with rich information other than video images during daily activities are rarely seen. In this paper, we introduce a novel dataset that focuses on social navigation between humans and robots in a future-oriented Wholesale and Retail Trade (WRT) environment (https://uf-retail-cobot-dataset.github.io/). Eight participants performed the tasks that are commonly undertaken by consumers and retail workers. More than 260 minutes of data were collected, including robot and human trajectories, human full-body motion capture, eye gaze directions, and other contextual information. Comprehensive descriptions of each category of data stream, as well as potential use cases are included. Furthermore, analysis with multiple data sources and future directions are discussed.more » « less
-
In this work, we integrate digital twin technology with RFID localization to achieve real-time monitoring of physical items in a large-scale complex environment, such as warehouses and retail stores. To map the item-level realities into a digital environment, we proposed a sensor fusion technique that merges a 3D map created by RGB-D and tracking cameras with real-time RFID tag location estimation derived from our novel Bayesian filter approach. Unlike mainstream localization methods, which rely on phase or RSSI measurements, our proposed method leverages a fixed RF transmission power model. This approach extends localization capabilities to all existing RFID devices, offering a significant advancement over conventional techniques. As a result, the proposed method transforms any RFID device into a digital twin scanner with the support of RGB-D cameras. To evaluate the performance of the proposed method, we prototype the system with commercial off-the-shelf (COTS) equipment in two representative retail scenarios. The overall performance of the system is demonstrated in a mock retail apparel store covering an area of 207 m2, while the quantitative experimental results are examined in a small-scale testbed to showcase the accuracy of item-level tag localization.more » « less
-
Recent advances in eye tracking have given birth to a new genre of gaze-based context sensing applications, ranging from cognitive load estimation to emotion recognition. To achieve state-of-the-art recognition accuracy, a large-scale, labeled eye movement dataset is needed to train deep learning-based classifiers. However, due to the heterogeneity in human visual behavior, as well as the labor-intensive and privacy-compromising data collection process, datasets for gaze-based activity recognition are scarce and hard to collect. To alleviate the sparse gaze data problem, we present EyeSyn, a novel suite of psychology-inspired generative models that leverages only publicly available images and videos to synthesize a realistic and arbitrarily large eye movement dataset. Taking gaze-based museum activity recognition as a case study, our evaluation demonstrates that EyeSyn can not only replicate the distinct pat-terns in the actual gaze signals that are captured by an eye tracking device, but also simulate the signal diversity that results from different measurement setups and subject heterogeneity. Moreover, in the few-shot learning scenario, EyeSyn can be readily incorporated with either transfer learning or meta-learning to achieve 90% accuracy, without the need for a large-scale dataset for training.more » « less
-
Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data sources. We propose a Gaze-aware Compositional GAN that learns to generate annotated facial images from a limited labeled dataset. Then we transfer this model to an unlabeled data domain to take advantage of the diversity it provides. Experiments demonstrate our approach's effectiveness in generating within-domain image augmentations in the ETH-XGaze dataset and cross-domain augmentations in the CelebAMask-HQ dataset domain for gaze estimation DNN training. We also show additional applications of our work, which include facial image editing and gaze redirection.more » « less
An official website of the United States government

