Reconstructing 3D faces with facial geometry from single images has allowed for major advances in animation, generative models, and virtual reality. However, this ability to represent faces with their 3D features is not as fully explored by the facial expression inference (FEI) community. This study therefore aims to investigate the impacts of integrating such 3D representations into the FEI task, specifically for facial expression classification and face-based valence-arousal (VA) estimation. To accomplish this, we first assess the performance of two 3D face representations (both based on the 3D morphable model, FLAME) for the FEI tasks. We further explore two fusion architectures, intermediate fusion and late fusion, for integrating the 3D face representations with existing 2D inference frameworks. To evaluate our proposed architecture, we extract the corresponding 3D representations and perform extensive tests on the AffectNet and RAF-DB datasets. Our experimental results demonstrate that our proposed method outperforms the state-of-the-art AffectNet VA estimation and RAF-DB classification tasks. Moreover, our method can act as a complement to other existing methods to boost performance in many emotion inference tasks. 
                        more » 
                        « less   
                    
                            
                            FACIAL GRID TRANSFORMATION: A NOVEL FACE REGISTRATION APPROACH FOR IMPROVING FACIAL ACTION UNIT RECOGNITION
                        
                    
    
            Face registration is a major and critical step for face analysis. Existing facial activity recognition systems often employ coarse face alignment based on a few fiducial points such as eyes and extract features from equal-sized grid. Such extracted features are susceptible to variations in face pose, facial deformation, and person-specific geometry. In this work, we propose a novel face registration method named facial grid transformation to improve feature extraction for recognizing facial Action Units (AUs). Based on the transformed grid, novel grid edge features are developed to capture local facial motions related to AUs. Extensive experiments on two wellknown AU-coded databases have demonstrated that the proposed method yields significant improvements over the methods based on equal-sized grid on both posed and more importantly, spontaneous facial displays. Furthermore, the proposed method also outperforms the state-of-the-art methods using either coarse alignment or mesh-based face registration. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1149787
- PAR ID:
- 10015083
- Date Published:
- Journal Name:
- Proceedings - International Conference on Image Processing
- ISSN:
- 1522-4880
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we propose a new deep framework which predicts facial attributes and leverage it as a soft modality to improve face identification performance. Our model is an end to end framework which consists of a convolutional neural network (CNN) whose output is fanned out into two separate branches; the first branch predicts facial attributes while the second branch identifies face images. Contrary to the existing multi-task methods which only use a shared CNN feature space to train these two tasks jointly, we fuse the predicted attributes with the features from the face modality in order to improve the face identification performance. Experimental results show that our model brings benefits to both face identification as well as facial attribute prediction performance, especially in the case of identity facial attributes such as gender prediction. We tested our model on two standard datasets annotated by identities and face attributes. Experimental results indicate that the proposed model outperforms most of the current existing face identification and attribute prediction methods.more » « less
- 
            Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry. Despite the significant improvements in sketch-to-photo synthesis techniques, existing methods have still serious limitations in practice, such as the need for paired data in the training phase or having no control on enforcing facial attributes over the synthesized image. In this work, we present a new framework, which is a conditional version of Cycle-GAN, conditioned on facial attributes. The proposed network forces facial attributes, such as skin and hair color, on the synthesized photo and does not need a set of aligned face-sketch pairs during its training. We evaluate the proposed network by training on two real and synthetic sketch datasets. The hand-sketch images of the FERET dataset and the color face images from the WVU Multi-modal dataset are used as an unpaired input to the proposed conditional CycleGAN with the skin color as the controlled face attribute. For more attribute guided evaluation, a synthetic sketch dataset is created from the CelebA dataset and used to evaluate the performance of the network by forcing several desired facial attributes on the synthesized faces.more » « less
- 
            Although pain is widely recognized to be a multidimensional experience, it is typically measured by unidimensional patient self-reported visual analog scale (VAS). However, self-reported pain is subjective, difficult to interpret and sometimes impossible to obtain. Machine learning models have been developed to automatically recognize pain at both the frame level and sequence (or video) level. Many methods use or learn facial action units (AUs) defined by the Facial Action Coding System (FACS) for describing facial expressions with muscle movement. In this paper, we analyze the relationship between sequence-level multidimensional pain measurements and frame-level AUs and an AU derived pain-related measure, the Prkachin and Solomon Pain Intensity (PSPI). We study methods that learn sequence-level metrics from frame-level metrics. Specifically, we explore an extended multitask learning model to predict VAS from human-labeled AUs with the help of other sequence-level pain measurements during training. This model consists of two parts: a multitask learning neural network model to predict multidimensional pain scores, and an ensemble learning model to linearly combine the multidimensional pain scores to best approximate VAS. Starting from human-labeled AUs, the model achieves a mean absolute error (MAE) on VAS of 1.73. It outperforms provided human sequence-level estimates which have an MAE of 1.76. Combining our machine learning model with the human estimates gives the best performance of MAE on VAS of 1.48.more » « less
- 
            Facial expressions of emotions by people with visual impairment and blindness via video conferencingMany people including those with visual impairment and blindness take advantage of video conferencing tools to meet people. Video conferencing tools enable them to share facial expressions that are considered as one of the most important aspects of human communication. This study aims to advance knowledge of how those with visual impairment and blindness share their facial expressions of emotions virtually. This study invited a convenience sample of 28 adults with visual impairment and blindness to Zoom video conferencing. The participants were instructed to pose facial expressions of basic human emotions (anger, fear, disgust, happiness, surprise, neutrality, calmness, and sadness), which were video recorded. The facial expressions were analyzed using the Facial Action Coding System (FACS) that encodes the movement of specific facial muscles called Action Units (AUs). This study found that there was a particular set of AUs significantly engaged in expressing each emotion, except for sadness. Individual differences were also found in AUs influenced by the participants’ visual acuity levels and emotional characteristics such as valence and arousal levels. The research findings are anticipated to serve as the foundation of knowledge, contributing to developing emotion-sensing technologies for those with visual impairment and blindness.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    