Eye tracking has already made its way to current commercial wearable display devices, and is becoming increasingly important for virtual and augmented reality applications. However, the existing model-based eye tracking solutions are not capable of conducting very accurate gaze angle measurements, and may not be sufficient to solve challenging display problems such as pupil steering or eyebox expansion. In this paper, we argue that accurate detection and localization of pupil in 3D space is a necessary intermediate step in model-based eye tracking. Existing methods and datasets either ignore evaluating the accuracy of 3D pupil localization or evaluate it only on synthetic data. To this end, we capture the first 3D pupilgaze-measurement dataset using a high precision setup with head stabilization and release it as the first benchmark dataset to evaluate both 3D pupil localization and gaze tracking methods. Furthermore, we utilize an advanced eye model to replace the commonly used oversimplified eye model. Leveraging the eye model, we propose a novel 3D pupil localization method with a deep learning-based corneal refraction correction. We demonstrate that our method outperforms the state-of-the-art works by reducing the 3D pupil localization error by 47.5% and the gaze estimation error by 18.7%. Our dataset and codes can be found here: link.
more »
« less
Student Engagement Assessment in Classrooms Using a Novel 3D Eye-Gaze Estimation and Evaluation Algorithm
Student attentiveness within the classroom can be assessed by observing student attention toward the teacher or whiteboard, which may be inferred through eye-gaze direction. This paper introduces a novel technique for evaluating student attentiveness by analyzing the direction of their eye gaze derived from their 3D skeletal pose in a reconstructed 3D environment. As for the contributions, the paper suggests a novel 3D head pose estimation algorithm that, unlike other works, does not need frontal face information. As a result, the method is highly effective in uncontrolled environments such as classrooms, where frontal face data is often unavailable. Moreover, a new algorithm was developed to evaluate student attentiveness based on 3D eye gaze information interpreted from the 3D head pose. The proposed method has been validated using a set of instructional videos collected at the University of Virginia.
more »
« less
- Award ID(s):
- 2322993
- PAR ID:
- 10627499
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-5405-8
- Page Range / eLocation ID:
- 926 to 930
- Subject(s) / Keyword(s):
- Student engagement transformer models AI
- Format(s):
- Medium: X
- Location:
- Pacific Grove, CA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract It remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.more » « less
-
In this paper, we seek to draw connections between the frontal and profile face images in an abstract embedding space. We exploit this connection using a coupled-encoder network to project frontal/profile face images into a common latent embedding space. The proposed model forces the similarity of representations in the embedding space by maximizing the mutual information between two views of the face. The proposed coupled-encoder benefits from three contributions for matching faces with extreme pose disparities. First, we leverage our pose-aware contrastive learning to maximize the mutual information between frontal and profile representations of identities. Second, a memory buffer, which consists of latent representations accumulated over past iterations, is integrated into the model so it can refer to relatively much more instances than the minibatch size. Third, a novel pose-aware adversarial domain adaptation method forces the model to learn an asymmetric mapping from profile to frontal representation. In our framework, the coupled-encoder learns to enlarge the margin between the distribution of genuine and imposter faces, which results in high mutual information between different views of the same identity. The effectiveness of the proposed model is investigated through extensive experiments, evaluations, and ablation studies on four benchmark datasets, and comparison with the compelling state-of-the-art algorithms.more » « less
-
In recent years, face recognition systems have achieved exceptional success due to promising advances in deep learning architectures. However, they still fail to achieve the expected accuracy when matching profile images against a gallery of frontal images. Current approaches either perform pose normalization (i.e., frontalization) or disentangle pose information for face recognition. We instead propose a new approach to utilize pose as auxiliary information via an attention mechanism. In this paper, we hypothesize that pose-attended information using an attention mechanism can guide contextual and distinctive feature extraction from profile faces, which further benefits better representation learning in an embedded domain. To achieve this, first, we design a unified coupled profile-to-frontal face recognition network. It learns the mapping from faces to a compact embedding subspace via a class-specific contrastive loss. Second, we develop a novel pose attention block (PAB) to specially guide the pose-agnostic feature extraction from profile faces. To be more specific, PAB is designed to explicitly help the network to focus on important features along both “channel” and “spatial” dimensions while learning discriminative yet pose-invariant features in an embedding subspace. To validate the effectiveness of our proposed method, we conduct experiments on both controlled and in the- wild benchmarks including Multi-PIE, CFP, and IJB-C, and show superiority over the state-of-the-art.more » « less
-
Abstract Humans detect faces efficiently from a young age. Face detection is critical for infants to identify and learn from relevant social stimuli in their environments. Faces with eye contact are an especially salient stimulus, and attention to the eyes in infancy is linked to the emergence of later sociality. Despite the importance of both of these early social skills—attending to faces and attending to the eyes—surprisingly little is known about how they interact. We used eye tracking to explore whether eye contact influences infants' face detection. Longitudinally, we examined 2‐, 4‐, and 6‐month‐olds' (N = 65) visual scanning of complex image arrays with human and animal faces varying in eye contact and head orientation. Across all ages, infants displayed superior detection of faces with eye contact; however, this effect varied as a function of species and head orientation. Infants were more attentive to human than animal faces and were more sensitive to eye and head orientation for human faces compared to animal faces. Unexpectedly, human faces with both averted heads and eyes received the most attention. This pattern may reflect the early emergence of gaze following—the ability to look where another individual looks—which begins to develop around this age. Infants may be especially interested in averted gaze faces, providing early scaffolding for joint attention. This study represents the first investigation to document infants' attention patterns to faces systematically varying in their attentional states. Together, these findings suggest that infants develop early, specialized functional conspecific face detection.more » « less
An official website of the United States government

