skip to main content


Title: Semantic Gaze Labeling for Human-Robot Shared Manipulation
Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a nite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions.  more » « less
Award ID(s):
1755823
NSF-PAR ID:
10090736
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM Symposium on Eye Tracking Research & Applications
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions. 
    more » « less
  2. Ribeiro, Haroldo V. (Ed.)
    Reading is a complex cognitive process that involves primary oculomotor function and high-level activities like attention focus and language processing. When we read, our eyes move by primary physiological functions while responding to language-processing demands. In fact, the eyes perform discontinuous twofold movements, namely, successive long jumps (saccades) interposed by small steps (fixations) in which the gaze “scans” confined locations. It is only through the fixations that information is effectively captured for brain processing. Since individuals can express similar as well as entirely different opinions about a given text, it is therefore expected that the form, content and style of a text could induce different eye-movement patterns among people. A question that naturally arises is whether these individuals’ behaviours are correlated, so that eye-tracking while reading can be used as a proxy for text subjective properties. Here we perform a set of eye-tracking experiments with a group of individuals reading different types of texts, including children stories, random word generated texts and excerpts from literature work. In parallel, an extensive Internet survey was conducted for categorizing these texts in terms of their complexity and coherence, considering a large number of individuals selected according to different ages, gender and levels of education. The computational analysis of the fixation maps obtained from the gaze trajectories of the subjects for a given text reveals that the average “magnetization” of the fixation configurations correlates strongly with their complexity observed in the survey. Moreover, we perform a thermodynamic analysis using the Maximum-Entropy Model and find that coherent texts were closer to their corresponding “critical points” than non-coherent ones, as computed from the Pairwise Maximum-Entropy method, suggesting that different texts may induce distinct cohesive reading activities. 
    more » « less
  3. Despite the phenomenal advances in the computational power and functionality of electronic systems, human-machine interaction has largely been limited to simple control panels, keyboard, mouse and display. Consequently, these systems either rely critically on close human guidance or operate almost independently from the user. An exemplar technology integrated tightly into our lives is the smartphone. However, the term “smart” is a misnomer, since it has fundamentally no intelligence to understand its user. The users still have to type, touch or speak (to some extent) to express their intentions in a form accessible to the phone. Hence, intelligent decision making is still almost entirely a human task. A life-changing experience can be achieved by transforming machines from passive tools to agents capable of understanding human physiology and what their user wants [1]. This can advance human capabilities in unimagined ways by building a symbiotic relationship to solve real world problems cooperatively. One of the high-impact application areas of this approach is assistive internet of things (IoT) technologies for physically challenged individuals. The Annual World Report on Disability reveals that 15% of the world population lives with disability, while 110 to 190 million of these people have difficulty in functioning [1]. Quality of life for this population can improve significantly if we can provide accessibility to smart devices, which provide sensory inputs and assist with everyday tasks. This work demonstrates that smart IoT devices open up the possibility to alleviate the burden on the user by equipping everyday objects, such as a wheelchair, with decision-making capabilities. Moving part of the intelligent decision making to smart IoT objects requires a robust mechanism for human-machine communication (HMC). To address this challenge, we present examples of multimodal HMC mechanisms, where the modalities are electroencephalogram (EEG), speech commands, and motion sensing. We also introduce an IoT co-simulation framework developed using a network simulator (OMNeT++) and a robot simulation platform Virtual Robot Experimentation Platform (V-REP). We show how this framework is used to evaluate the effectiveness of different HMC strategies using automated indoor navigation as a driver application. 
    more » « less
  4. Observable reading behavior, the act of moving the eyes over lines of text, is highly stereotyped among the users of a language, and this has led to the development of reading detectors–methods that input windows of sequential fixations and output predictions of the fixation behavior during those windows being reading or skimming. The present study introduces a newmethod for reading detection using Region Ranking SVM (RRSVM). An SVM-based classifier learns the local oculomotor features that are important for real-time reading detection while it is optimizing for the global reading/skimming classification, making it unnecessary to hand-label local fixation windows for model training. This RRSVM reading detector was trained and evaluated using eye movement data collected in a laboratory context, where participants viewed modified web news articles and had to either read them carefully for comprehension or skim them quickly for the selection of keywords (separate groups). Ground truth labels were known at the global level (the instructed reading or skimming task), and obtained at the local level in a separate rating task. The RRSVM reading detector accurately predicted 82.5% of the global (article-level) reading/skimming behavior, with accuracy in predicting local window labels ranging from 72-95%, depending on how tuned the RRSVM was for local and global weights. With this RRSVM reading detector, a method now exists for near real-time reading detection without the need for hand-labeling of local fixation windows. With real-time reading detection capability comes the potential for applications ranging from education and training to intelligent interfaces that learn what a user is likely to know based on previous detection of their reading behavior. 
    more » « less
  5. Remote eye tracking with automated corneal reflection provides insights into the emergence and development of cognitive, social, and emotional functions in human infants and non-human primates. However, because most eye-tracking systems were designed for use in human adults, the accuracy of eye-tracking data collected in other populations is unclear, as are potential approaches to minimize measurement error. For instance, data quality may differ across species or ages, which are necessary considerations for comparative and developmental studies. Here we examined how the calibration method and adjustments to areas of interest (AOIs) of the Tobii TX300 changed the mapping of fixations to AOIs in a cross-species longitudinal study. We tested humans (N = 119) at 2, 4, 6, 8, and 14 months of age and macaques (Macaca mulatta; N = 21) at 2 weeks, 3 weeks, and 6 months of age. In all groups, we found improvement in the proportion of AOI hits detected as the number of successful calibration points increased, suggesting calibration approaches with more points may be advantageous. Spatially enlarging and temporally prolonging AOIs increased the number of fixation-AOI mappings, suggesting improvements in capturing infants’ gaze behaviors; however, these benefits varied across age groups and species, suggesting different parameters may be ideal, depending on the population studied. In sum, to maximize usable sessions and minimize measurement error, eye-tracking data collection and extraction approaches may need adjustments for the age groups and species studied. Doing so may make it easier to standardize and replicate eye-tracking research findings. 
    more » « less