Facial attribute recognition is conventionally computed from a single image. In practice, each subject may have multiple face images. Taking the eye size as an example, it should not change, but it may have different estimation in multiple images, which would make a negative impact on face recognition. Thus, how to compute these attributes corresponding to each subject rather than each single image is a profound work. To address this question, we deploy deep training for facial attributes prediction, and we explore the inconsistency issue among the attributes computed from each single image. Then, we develop two approaches to address the inconsistency issue. Experimental results show that the proposed methods can handle facial attribute estimation on either multiple still images or video frames, and can correct the incorrectly annotated labels. The experiments are conducted on two large public databases with annotations of facial attributes.
Facial Attributes Guided Deep Sketch-to-Photo Synthesis
Face sketch-photo synthesis is a critical application in law enforcement and digital entertainment industry. Despite the significant improvements in sketch-to-photo synthesis techniques, existing methods have still serious limitations in practice, such as the need for paired data in the training phase or having no control on enforcing facial attributes over the synthesized image. In this work, we present a new framework, which is a conditional version of Cycle-GAN, conditioned on facial attributes. The proposed network forces facial attributes, such as skin and hair color, on the synthesized photo and does not need a set of aligned face-sketch pairs during its training. We evaluate the proposed network by training on two real and synthetic sketch datasets. The hand-sketch images of the FERET dataset and the color face images from the WVU Multi-modal dataset are used as an unpaired input to the proposed conditional CycleGAN with the skin color as the controlled face attribute. For more attribute guided evaluation, a synthetic sketch dataset is created from the CelebA dataset and used to evaluate the performance of the network by forcing several desired facial attributes on the synthesized faces.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- IEEE Winter Applications of Computer Vision Workshops (WACVW)
- Page Range or eLocation-ID:
- 1 to 8
- Sponsoring Org:
- National Science Foundation
More Like this
In this paper, we propose a new deep framework which predicts facial attributes and leverage it as a soft modality to improve face identification performance. Our model is an end to end framework which consists of a convolutional neural network (CNN) whose output is fanned out into two separate branches; the first branch predicts facial attributes while the second branch identifies face images. Contrary to the existing multi-task methods which only use a shared CNN feature space to train these two tasks jointly, we fuse the predicted attributes with the features from the face modality in order to improve the face identification performance. Experimental results show that our model brings benefits to both face identification as well as facial attribute prediction performance, especially in the case of identity facial attributes such as gender prediction. We tested our model on two standard datasets annotated by identities and face attributes. Experimental results indicate that the proposed model outperforms most of the current existing face identification and attribute prediction methods.
Deepfakes represent the generation of synthetic/fake images or videos using deep neural networks. As the techniques used for the generation of deepfakes are improving, the threats including social media disinformation, defamation, impersonation, and fraud are becoming more prevalent. The existing deepfakes detection models, including those that use convolution neural networks, do not generalize well when subjected to multiple deepfakes generation techniques and cross-corpora setting. Therefore, there is a need for the development of effective and efficient deepfakes detection methods. To explicitly model part-whole hierarchical relationships by using groups of neurons to encode visual entities and learn the relationships between real and fake artifacts, we propose a novel deep learning model efficient-capsule network (E-Cap Net) for classifying the facial images generated through different deepfakes generative techniques. More specifically, we introduce a low-cost max-feature-map (MFM) activation function in each primary capsule of our proposed E-Cap Net. The use of MFM activation enables our E-Cap Net to become light and robust as it suppresses the low activation neurons in each primary capsule. Performance of our approach is evaluated on two standard, largescale and diverse datasets i.e., Diverse Fake Face Dataset (DFFD) and FaceForensics++ (FF++), and also on the World Leaders Dataset (WLRD). Moreover,more »
Text-to-image generative models have achieved unprecedented success in generating high-quality images based on natural language descriptions. However, it is shown that these models tend to favor specific social groups when prompted with neutral text descriptions (e.g., ‘a photo of a lawyer’). Following Zhao et al. (2021), we study the effect on the diversity of the generated images when adding ethical intervention that supports equitable judgment (e.g., ‘if all individuals can be a lawyer irrespective of their gender’) in the input prompts. To this end, we introduce an Ethical NaTural Language Interventions in Text-to-Image GENeration (ENTIGEN) benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes – gender, skin color, and culture. Through CLIP-based and human evaluation on minDALL.E, DALL.E-mini and Stable Diffusion, we find that the model generations cover diverse social groups while preserving the image quality. In some cases, the generations would be anti-stereotypical (e.g., models tend to create images with individuals that are perceived as man when fed with prompts about makeup) in the presence of ethical intervention. Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as ‘irrespective of gender’ in themore »
MAGIC: Multitask Automated Generation of Inter-modal CT Perfusion Maps via Generative Adversarial NetworkIntroduction: Computed tomography perfusion (CTP) imaging requires injection of an intravenous contrast agent and increased exposure to ionizing radiation. This process can be lengthy, costly, and potentially dangerous to patients, especially in emergency settings. We propose MAGIC, a multitask, generative adversarial network-based deep learning model to synthesize an entire CTP series from only a non-contrasted CT (NCCT) input. Materials and Methods: NCCT and CTP series were retrospectively retrieved from 493 patients at UF Health with IRB approval. The data were deidentified and all images were resized to 256x256 pixels. The collected perfusion data were analyzed using the RapidAI CT Perfusion analysis software (iSchemaView, Inc. CA) to generate each CTP map. For each subject, 10 CTP slices were selected. Each slice was paired with one NCCT slice at the same location and two NCCT slices at a predefined vertical offset, resulting in 4.3K CTP images and 12.9K NCCT images used for training. The incorporation of a spatial offset into the NCCT input allows MAGIC to more accurately synthesize cerebral perfusive structures, increasing the quality of the generated images. The studies included a variety of indications, including healthy tissue, mild infarction, and severe infarction. The proposed MAGIC model incorporates a novel multitaskmore »