Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.
more »
« less
Semantic Gaze Labeling for Human-Robot Shared Manipulation
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a nite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.
more »
« less
- Award ID(s):
- 1755823
- PAR ID:
- 10090736
- Date Published:
- Journal Name:
- ACM Symposium on Eye Tracking Research & Applications
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Shared control systems can make complex robot teleoperation tasks easier for users. These systems predict the user’s goal, determine the motion required for the robot to reach that goal, and combine that motion with the user’s input. Goal prediction is generally based on the user’s control input (e.g., the joystick signal). In this paper, we show that this prediction method is especially effective when users follow standard noisily optimal behavior models. In tasks with input constraints like modal control, however, this effectiveness no longer holds, so additional sources for goal prediction can improve assistance. We implement a novel shared control system that combines natural eye gaze with joystick input to predict people’s goals online, and we evaluate our system in a real-world, COVID-safe user study. We find that modal control reduces the efficiency of assistance according to our model, and when gaze provides a prediction earlier in the task, the system’s performance improves. However, gaze on its own is unreliable and assistance using only gaze performs poorly. We conclude that control input and natural gaze serve different and complementary roles in goal prediction, and using them together leads to improved assistance.more » « less
-
We introduce WebGazer, an online eye tracker that uses common webcams already present in laptops and mobile devices to infer the eye-gaze locations of web visitors on a page in real time. The eye tracking model self-calibrates by watching web visitors interact with the web page and trains a mapping between features of the eye and positions on the screen. This approach aims to provide a natural experience to everyday users that is not restricted to laboratories and highly controlled user studies. WebGazer has two key components: a pupil detector that can be combined with any eye detection library, and a gaze estimator using regression analysis informed by user interactions. We perform a large remote online study and a small in-person study to evaluate WebGazer. The findings show that WebGazer can learn from user interactions and that its accuracy is sufficient for approximating the user's gaze. As part of this paper, we release the first eye tracking library that can be easily integrated in any website for real-time gaze interactions, usability studies, or web research.more » « less
-
Emerging Virtual Reality (VR) displays with embedded eye trackers are currently becoming a commodity hardware (e.g., HTC Vive Pro Eye). Eye-tracking data can be utilized for several purposes, including gaze monitoring, privacy protection, and user authentication/identification. Identifying users is an integral part of many applications due to security and privacy concerns. In this paper, we explore methods and eye-tracking features that can be used to identify users. Prior VR researchers explored machine learning on motion-based data (such as body motion, head tracking, eye tracking, and hand tracking data) to identify users. Such systems usually require an explicit VR task and many features to train the machine learning model for user identification. We propose a system to identify users utilizing minimal eye-gaze-based features without designing any identification-specific tasks. We collected gaze data from an educational VR application and tested our system with two machine learning (ML) models, random forest (RF) and k-nearest-neighbors (kNN), and two deep learning (DL) models: convolutional neural networks (CNN) and long short-term memory (LSTM). Our results show that ML and DL models could identify users with over 98% accuracy with only six simple eye-gaze features. We discuss our results, their implications on security and privacy, and the limitations of our work.more » « less
-
Human-robot interaction is a critical area of research, providing support for collaborative tasks where a human instructs a robot to interact with and manipulate objects in an environment. However, an under-explored element of these collaborative manipulation tasks are small-scale building exercises, in which the human and robot are working together in close proximity with the same set of objects. Under these conditions, it is essential to ensure the human’s safety and mitigate comfort risks during the interaction. As there is danger in exposing humans to untested robots, a safe and controlled environment is required. Simulation and virtual reality (VR) for HRI have shown themselves to be suitable tools for creating space for human-robot experimentation that can be beneficial in these scenarios. However, the use of simulation and VR comes with the possibility of failures resulting from the sim-to-real gap, where the behavior of the simulated robot may not accurately reflect the experience of a human collaborator in a real-world setting. This gap can limit the generalizability of research findings and raise questions about the validity of using simulation and VR for HRI research. Our goal in this work is to demonstrate the effectiveness of sim-to-real approaches for contact-based human-robot interaction.more » « less