skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robustness Disparities in Face Detection
Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are masculine presenting, older, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.  more » « less
Award ID(s):
1846237 2124270 2150382
PAR ID:
10409133
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Advances in Neural Information Processing Systems 35 (NeurIPS 2022) Datasets and Benchmarks Track
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this study, we investigate how different types of masks affect automatic emotion classification in different channels of audio, visual, and multimodal. We train emotion classification models for each modality with the original data without mask and the re-generated data with mask respectively, and investigate how muffled speech and occluded facial expressions change the prediction of emotions. Moreover, we conduct the contribution analysis to study how muffled speech and occluded face interplay with each other and further investigate the individual contribution of audio, visual, and audio-visual modalities to the prediction of emotion with and without mask. Finally, we investigate the cross-corpus emotion recognition across clear speech and re-generated speech with different types of masks, and discuss the robustness of speech emotion recognition. 
    more » « less
  2. In recent news, organizations have been considering the use of facial and emotion recognition for applications involving youth such as tackling surveillance and security in schools. However, the majority of efforts on facial emotion recognition research have focused on adults. Children, particularly in their early years, have been shown to express emotions quite differently than adults. Thus, before such algorithms are deployed in environments that impact the wellbeing and circumstance of youth, a careful examination should be made on their accuracy with respect to appropriateness for this target demographic. In this work, we utilize several datasets that contain facial expressions of children linked to their emotional state to evaluate eight different commercial emotion classification systems. We compare the ground truth labels provided by the respective datasets to the labels given with the highest confidence by the classification systems and assess the results in terms of matching score (TPR), positive predictive value, and failure to compute rate. Overall results show that the emotion recognition systems displayed subpar performance on the datasets of children's expressions compared to prior work with adult datasets and initial human ratings. We then identify limitations associated with automated recognition of emotions in children and provide suggestions on directions with enhancing recognition accuracy through data diversification, dataset accountability, and algorithmic regulation. 
    more » « less
  3. Facial activity is the most direct signal for perceiving emotional states in people. Emotion analysis from facial displays has been attracted an increasing attention because of its wide applications from human-centered computing to neuropsychiatry. Recently, image representation based on sparse coding has shown promising results in facial expression recognition. In this paper, we introduce a novel image representation for facial expression analysis. Specifically, we propose to use the histograms of nonnegative sparse coded image features to represent a facial image. In order to capture fine appearance variations caused by facial expression, logarithmic transformation is further employed on each nonnegative sparse coded feature. In addition, the proposed Histograms of Log-Transformed Nonnegative Sparse Coding (HLNNSC) features are calculated and organized in a pyramid-like structure such that the spatial relationships among the features are captured and utilized to enhance the performance of facial expression recognition. Extensive experiments on the Cohn-Kanade database show that the proposed approach yields a significant improvement in facial expression recognition and outperforms the other sparse coding based baseline approaches. Furthermore, experimental results on the GEMEP-FERA2011 dataset demonstrate that the proposed approach is promising for recognition under less controlled and thus more challenging environment. 
    more » « less
  4. Identifying people in photographs is a critical task in a wide variety of domains, from national security [7] to journalism [14] to human rights investigations [1]. The task is also fundamentally complex and challenging. With the world population at 7.6 billion and growing, the candidate pool is large. Studies of human face recognition ability show that the average person incorrectly identifies two people as similar 20–30% of the time, and trained police detectives do not perform significantly better [11]. Computer vision-based face recognition tools have gained considerable ground and are now widely available commercially, but comparisons to human performance show mixed results at best [2,10,16]. Automated face recognition techniques, while powerful, also have constraints that may be impractical for many real-world contexts. For example, face recognition systems tend to suffer when the target image or reference images have poor quality or resolution, as blemishes or discolorations may be incorrectly recognized as false positives for facial landmarks. Additionally, most face recognition systems ignore some salient facial features, like scars or other skin characteristics, as well as distinctive non-facial features, like ear shape or hair or facial hair styles. This project investigates how we can overcome these limitations to support person identification tasks. By adjusting confidence thresholds, users of face recognition can generally expect high recall (few false negatives) at the cost of low precision (many false positives). Therefore, we focus our work on the “last mile” of person identification, i.e., helping a user find the correct match among a large set of similarlooking candidates suggested by face recognition. Our approach leverages the powerful capabilities of the human vision system and collaborative sensemaking via crowdsourcing to augment the complementary strengths of automatic face recognition. The result is a novel technology pipeline combining collective intelligence and computer vision. We scope this project to focus on identifying soldiers in photos from the American Civil War era (1861– 1865). An estimated 4,000,000 soldiers fought in the war, and most were photographed at least once, due to decreasing costs, the increasing robustness of the format, and the critical events separating friends and family [17]. Over 150 years later, the identities of most of these portraits have been lost, but as museums and archives increasingly digitize and publish their collections online, the pool of reference photos and information has never been more accessible. Historians, genealogists, and collectors work tirelessly to connect names with faces, using largely manual identification methods [3,9]. Identifying people in historical photos is important for preserving material culture [9], correcting the historical record [13], and recognizing contributions of marginalized groups [4], among other reasons. 
    more » « less
  5. Facial Recognition Systems (FRS) have become one of the most viable biometric identity authentication approaches in supervised and unsupervised applications. However, FRSs are known to be vulnerable to adversarial attacks such as identity theft and presentation attacks. The master face dictionary attacks (MFDA) leveraging multiple enrolled face templates have posed a notable threat to FRS. Federated learning-based FRS deployed on edge or mobile devices are particularly vulnerable to MFDA due to the absence of robust MF detectors. To mitigate the MFDA risks, we propose a trustworthy authentication system against visual MFDA (Trauma). Trauma leverages the analysis of specular highlights on diverse facial components and physiological characteristics inherent to human faces, exploiting the inability of existing MFDAs to replicate reflective elements accurately. We have developed a feature extractor network that employs a lightweight and low-latency vision transformer architecture to discern inconsistencies among specular highlights and physiological features in facial imagery. Extensive experimentation has been conducted to assess Trauma’s efficacy, utilizing public GAN-face detection datasets and mobile devices. Empirical findings demonstrate that Trauma achieves high detection accuracy, ranging from 97.83% to 99.56%, coupled with rapid detection speeds (less than 11 ms on mobile devices), even when confronted with state-of-the-art MFDA techniques. 
    more » « less