skip to main content


This content will become publicly available on October 1, 2024

Title: Examining StyleGAN as a Utility-Preserving Face De-identification Method

Several face de-identification methods have been proposed to preserve users’ privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, i.e., their age, gender, pose, and facial expression. Recently, advanced generative adversarial network models, such as StyleGAN [ 33], have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing, where the styles or features of the target face and an auxiliary face get mixed to generate a de-identified face that carries the utilities of the target face. We examined this de-identification method for preserving utility and privacy by implementing several face detection, verification, and identification attacks and conducting a user study. The results from our extensive experiments, human evaluation, and comparison with two state-of-the-art face de-identification methods, i.e., CIAGAN and DeepPrivacy, show that StyleGAN performs on par or better than these methods, preserving users’ privacy and images’ utility. In particular, the results of the machine learning-based experiments show that StyleGAN0-4 preserves utility better than CIAGAN and DeepPrivacy while preserving privacy at the same level. StyleGAN 0-3 preserves utility at the same level while providing more privacy. In this paper, for the first time, we also performed a carefully designed user study to examine both privacy and utility-preserving properties of StyleGAN 0-3, 0-4, and 0-5, as well as CIAGAN and DeepPrivacy from the human observers’ perspectives. Our statistical tests showed that participants tend to verify and identify StyleGAN 0-5 images easier than DeepPrivacy images. All the methods but StyleGAN 0-5 had significantly lower identification rates than CIAGAN. Regarding utility, as expected, StyleGAN 0-5 performed significantly better in preserving some attributes. Among all methods, on average, participants believe gender has been preserved the most while naturalness has been preserved the least.

 
more » « less
Award ID(s):
2107296
NSF-PAR ID:
10474127
Author(s) / Creator(s):
;
Publisher / Repository:
PoPETS
Date Published:
Journal Name:
Proceedings on Privacy Enhancing Technologies
Volume:
2023
Issue:
4
ISSN:
2299-0984
Page Range / eLocation ID:
341 to 358
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Identifying people in photographs is a critical task in a wide variety of domains, from national security [7] to journalism [14] to human rights investigations [1]. The task is also fundamentally complex and challenging. With the world population at 7.6 billion and growing, the candidate pool is large. Studies of human face recognition ability show that the average person incorrectly identifies two people as similar 20–30% of the time, and trained police detectives do not perform significantly better [11]. Computer vision-based face recognition tools have gained considerable ground and are now widely available commercially, but comparisons to human performance show mixed results at best [2,10,16]. Automated face recognition techniques, while powerful, also have constraints that may be impractical for many real-world contexts. For example, face recognition systems tend to suffer when the target image or reference images have poor quality or resolution, as blemishes or discolorations may be incorrectly recognized as false positives for facial landmarks. Additionally, most face recognition systems ignore some salient facial features, like scars or other skin characteristics, as well as distinctive non-facial features, like ear shape or hair or facial hair styles. This project investigates how we can overcome these limitations to support person identification tasks. By adjusting confidence thresholds, users of face recognition can generally expect high recall (few false negatives) at the cost of low precision (many false positives). Therefore, we focus our work on the “last mile” of person identification, i.e., helping a user find the correct match among a large set of similarlooking candidates suggested by face recognition. Our approach leverages the powerful capabilities of the human vision system and collaborative sensemaking via crowdsourcing to augment the complementary strengths of automatic face recognition. The result is a novel technology pipeline combining collective intelligence and computer vision. We scope this project to focus on identifying soldiers in photos from the American Civil War era (1861– 1865). An estimated 4,000,000 soldiers fought in the war, and most were photographed at least once, due to decreasing costs, the increasing robustness of the format, and the critical events separating friends and family [17]. Over 150 years later, the identities of most of these portraits have been lost, but as museums and archives increasingly digitize and publish their collections online, the pool of reference photos and information has never been more accessible. Historians, genealogists, and collectors work tirelessly to connect names with faces, using largely manual identification methods [3,9]. Identifying people in historical photos is important for preserving material culture [9], correcting the historical record [13], and recognizing contributions of marginalized groups [4], among other reasons. 
    more » « less
  2. Abstract

    Portrait synthesis creates realistic digital avatars which enable users to interact with others in a compelling way. Recent advances in StyleGAN and its extensions have shown promising results in synthesizing photorealistic and accurate reconstruction of human faces. However, previous methods often focus on frontal face synthesis and most methods are not able to handle large head rotations due to the training data distribution of StyleGAN. In this work, our goal is to take as input a monocular video of a face, and create an editable dynamic portrait able to handle extreme head poses. The user can create novel viewpoints, edit the appearance, and animate the face. Our method utilizes pivotal tuning inversion (PTI) to learn a personalized video prior from a monocular video sequence. Then we can input pose and expression coefficients to MLPs and manipulate the latent vectors to synthesize different viewpoints and expressions of the subject. We also propose novel loss functions to further disentangle pose and expression in the latent space. Our algorithm shows much better performance over previous approaches on monocular video datasets, and it is also capable of running in real‐time at 54 FPS on an RTX 3080.

     
    more » « less
  3. Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues. 
    more » « less
  4. Abstract There is an urgent need for developing collaborative process-defect modeling in metal-based additive manufacturing (AM). This mainly stems from the high volume of training data needed to develop reliable machine learning models for in-situ anomaly detection. The requirements for large data are especially challenging for small-to-medium manufacturers (SMMs), for whom collecting copious amounts of data is usually cost prohibitive. The objective of this research is to develop a secured data sharing mechanism for directed energy deposition (DED) based AM without disclosing product design information, facilitating secured data aggregation for collaborative modeling. However, one major obstacle is the privacy concerns that arise from data sharing, since AM process data contain confidential design information, such as the printing path. The proposed adaptive design de-identification for additive manufacturing (ADDAM) methodology integrates AM process knowledge into an adaptive de-identification procedure to mask the printing trajectory information in metal-based AM thermal history, which otherwise discloses substantial printing path information. This adaptive approach applies a flexible data privacy level to each thermal image based on its similarity with the other images, facilitating better data utility preservation while protecting data privacy. A real-world case study was used to validate the proposed method based on the fabrication of two cylindrical parts using a DED process. These results are expressed as a Pareto optimal solution, demonstrating significant improvements in privacy gain and minimal utility loss. The proposed method can facilitate privacy improvements of up to 30% with as little as 0% losses in dataset utility after de-identification. 
    more » « less
  5. The increased ubiquitousness of small smart devices, such as cell- phones, tablets, smart watches and laptops, has led to unique user data, which can be locally processed. The sensors (e.g., microphones and webcam) and improved hardware of the new devices have al- lowed running deep learning models that 20 years ago would have been exclusive to high-end expensive machines. In spite of this progress, state-of-the-art algorithms for facial expression recognition (FER) rely on architectures that cannot be implemented on these devices due to computational and memory constraints. Alternatives involving cloud-based solutions impose privacy barriers that prevent their adoption or user acceptance in wide range of applications. This paper proposes a lightweight model that can run in real-time for image facial expression recognition (IFER) and video facial expression recognition (VFER). The approach relies on a personalization mechanism locally implemented for each subject by fine-tuning a central VFER model with unlabeled videos from a target subject. We train the IFER model to generate pseudo labels and we select the videos with the highest confident predictions to be used for adaptation. The adaptation is performed by implementing a federated learning strategy where the weights of the local model are averaged and used by the central VFER model. We demonstrate that this approach can improve not only the performance on the edge device providing personalized models to the users, but also the central VFER model. We implement a federated learning strategy where the weights of the local models are averaged and used by the central VFER. Within corpus and cross-corpus evaluations on two emotional databases demonstrate that edge models adapted with our personalization strategy achieve up to 13.1% gains in F1-scores. Furthermore, the federated learning implementation improves the mean micro F1-score of the central VFER model by up to 3.4%. The proposed lightweight solution is ideal for interactive user interfaces that preserve the data of the users. 
    more » « less