Studies in psychology show that not all facial regions are of importance in recognizing facial expressions and different facial regions make different contributions in various facial expressions. Motivated by this, a novel framework, named Feature Disentangling Machine (FDM), is proposed to effectively select active features characterizing facial expressions. More importantly, the FDM aims to disentangle these selected features into non-overlapped groups, in particular, common features that are shared across different expressions and expression-specific features that are discriminative only for a target expression. Specifically, the FDM integrates sparse support vector machine and multi-task learning in a unified framework, where a novel loss function and a set of constraints are formulated to precisely control the sparsity and naturally disentangle active features. Extensive experiments on two well-known facial expression databases have demonstrated that the FDM outperforms the state-of-the-art methods for facial expression analysis. More importantly, the FDM achieves an impressive performance in a cross-database validation, which demonstrates the generalization capability of the selected features.
more »
« less
Facial Expression Recognition via a Boosted Deep Belief Network
A training process for facial expression recognition is usually performed sequentially in three individual stages: feature learning, feature selection, and classifier construction. Extensive empirical studies are needed to search for an optimal combination of feature representation, feature set, and classifier to achieve good recognition performance. This paper presents a novel Boosted Deep Belief Network (BDBN) for performing the three training stages iteratively in a unified loopy framework. Through the proposed BDBN framework, a set of features, which is effective to characterize expression-related facial appearance/shape changes, can be learned and selected to form a boosted strong classifier in a statistical way. As learning continues, the strong classifier is improved iteratively and more importantly, the discriminative capabilities of selected features are strengthened as well according to their relative importance to the strong classifier via a joint fine-tune process in the BDBN framework. Extensive experiments on two public databases showed that the BDBN framework yielded dramatic improvements in facial expression analysis.
more »
« less
- Award ID(s):
- 1149787
- PAR ID:
- 10015080
- Date Published:
- Journal Name:
- IEEE Conference on Computer Vision and Pattern Recognition
- ISSN:
- 2163-6648
- Page Range / eLocation ID:
- 1805 - 1812
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Facial activity is the most direct signal for perceiving emotional states in people. Emotion analysis from facial displays has been attracted an increasing attention because of its wide applications from human-centered computing to neuropsychiatry. Recently, image representation based on sparse coding has shown promising results in facial expression recognition. In this paper, we introduce a novel image representation for facial expression analysis. Specifically, we propose to use the histograms of nonnegative sparse coded image features to represent a facial image. In order to capture fine appearance variations caused by facial expression, logarithmic transformation is further employed on each nonnegative sparse coded feature. In addition, the proposed Histograms of Log-Transformed Nonnegative Sparse Coding (HLNNSC) features are calculated and organized in a pyramid-like structure such that the spatial relationships among the features are captured and utilized to enhance the performance of facial expression recognition. Extensive experiments on the Cohn-Kanade database show that the proposed approach yields a significant improvement in facial expression recognition and outperforms the other sparse coding based baseline approaches. Furthermore, experimental results on the GEMEP-FERA2011 dataset demonstrate that the proposed approach is promising for recognition under less controlled and thus more challenging environment.more » « less
-
Student engagement is a key component of learning and teaching, resulting in a plethora of automated methods to measure it. Whereas most of the literature explores student engagement analysis using computer-based learning often in the lab, we focus on using classroom instruction in authentic learning environments. We collected audiovisual recordings of secondary school classes over a one and a half month period, acquired continuous engagement labeling per student (N=15) in repeated sessions, and explored computer vision methods to classify engagement from facial videos. We learned deep embeddings for attentional and affective features by training Attention-Net for head pose estimation and Affect-Net for facial expression recognition using previously-collected large-scale datasets. We used these representations to train engagement classifiers on our data, in individual and multiple channel settings, considering temporal dependencies. The best performing engagement classifiers achieved student-independent AUCs of .620 and .720 for grades 8 and 12, respectively, with attention-based features outperforming affective features. Score-level fusion either improved the engagement classifiers or was on par with the best performing modality. We also investigated the effect of personalization and found that only 60 seconds of person-specific data, selected by margin uncertainty of the base classifier, yielded an average AUC improvement of .084.more » « less
-
In this paper we proposed a real-time face mask detection and recognition for CCTV surveillance camera videos. The proposed work consists of six steps: video acquisition and keyframes selection, data augmentation, facial parts segmentation, pixel-based feature extraction, Bag of Visual Words (BoVW) generation, face mask detection, and face recognition. In the first step, a set of keyframes are selected using histogram of gradient (HoG) algorithm. Secondly, data augmentation is involved with three steps as color normalization, illumination correction (CLAHE), and poses normalization (Angular Affine Transformation). In third step, facial parts are segmented using clustering approach i.e. Expectation Maximization with Gaussian Mixture Model (EM-GMM), in which facial regions are segmented into Eyes, Nose, Mouth, Chin, and Forehead. Then, Pixel-based Feature Extraction is performed using Yolo Nano approach, which performance is higher and lightweight model than the Yolo Tiny V2 and Yolo Tiny V3, and extracted features are constructed into Codebook by Hassanat Similarity with K-Nearest neighbor (H-M with KNN) algorithm. For mask detection, L2 distance function is used. The final step is face recognition which is implemented by a Kernel-based Extreme Learning Machine with Slime Mould Optimization (SMO). Experiments conducted using Python IDLE 3.8 for the proposed Yolo Nano model and also previous works as GMM with Deep learning (GMM+DL), Convolutional Neural Network (CNN) with VGGF, Yolo Tiny V2, and Yolo Tiny V3 in terms of various performance metrics.more » « less
-
Recent advancements in person recognition have raised concerns about identity privacy leaks. Gait recognition through millimeter-wave radar provides a privacy-centric method. However, it is challenged by lower accuracy due to the sparse data these sensors capture. We are the first to investigate a cross-modal method, IdentityKD, to enhance gait-based person recognition with the assistance of facial data. IdentityKD involves a training process using both gait and facial data, while the inference stage is conducted exclusively with gait data. To effectively transfer facial knowledge to the gait model, we create a composite feature representation using contrastive learning. This method integrates facial and gait features into a unified embedding that captures the unique identityspecific information from both modalities. We employ two distinct contrastive learning losses. One minimizes the distance between embeddings of data pairs from the same person, enhancing intraclass compactness, while the other maximizes the distance between embeddings of data pairs from different individuals, improving inter-class separability. Additionally, we use an identity-wise distillation strategy, which tailors the training process for each individual, ensuring that the model learns to distinguish between different identities more effectively. Our experiments on a dataset of 36 subjects, each providing over 5000 face-gait pairs, demonstrate that IdentityKD improves identity recognition accuracy by 6.5% compared to baseline methods.more » « less