skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation
Existing public face image datasets are strongly biased toward Caucasian faces, and other races (e.g., Latino) are significantly underrepresented. The models trained from such datasets suffer from inconsistent classification accuracy, which limits the applicability of face analytic systems to non-White race groups. To mitigate the race bias problem in these datasets, we constructed a novel face image dataset containing 108,501 images which is balanced on race. We define 7 race groups: White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, and Latino. Images were collected from the YFCC-100M Flickr dataset and labeled with race, gender, and age groups. Evaluations were performed on existing face attribute datasets as well as novel image datasets to measure the generalization performance. We find that the model trained from our dataset is substantially more accurate on novel datasets and the accuracy is consistent across race and gender groups. We also compare several commercial computer vision APIs and report their balanced accuracy across gender, race, and age groups. Our code, data, and models are available at https://github.com/joojs/fairface.  more » « less
Award ID(s):
1831848
PAR ID:
10299073
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
Page Range / eLocation ID:
1547 to 1557
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In the field of healthcare, electronic health records (EHR) serve as crucial training data for developing machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform significantly worse in deployment for individuals of the minority classes compared to those from majority classes, which may lead to inequitable healthcare outcomes for minority groups. To address this challenge, we propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE), a novel approach to augment imbalanced datasets using samples generated by a deep generative model. The MCRAGE process involves training a Conditional Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, resulting in a more balanced distribution across all classes, which can be used to train less biased downstream models. We measure the performance of MCRAGE versus alternative approaches using Accuracy, F1 score and AUROC of these downstream models. We provide theoretical justification for our method in terms of recent convergence results for DDPMs. 
    more » « less
  2. The prevalent commercial deployment of automated facial analysis systems such as face recognition as a robust authentication method has increasingly fueled scientific attention. Current machine learning algorithms allow for a relatively reliable detection, recognition, and categorization of face images comprised of age, race, and gender. Algorithms with such biased data are bound to produce skewed results. It leads to a significant decrease in the performance of state-of-the-art models when applied to images of gender or ethnicity groups. In this paper, we study the gender bias in facial recognition with gender balanced and imbalanced training sets using five traditional machine learning algorithms. We aim to report the machine learning classifiers which are inclined towards gender bias and the ones which mitigate it. Miss rates metric is effective in finding out potential bias in predictions. Our study utilizes miss rates metric along with a standard metric such as accuracy, precision or recall to evaluate possible gender bias effectively. 
    more » « less
  3. ABSTRACT IntroductionIndividuals' math value beliefs are theorized to influence who persists in STEM. However, the existing findings on gender differences in adolescents' math value beliefs are inconsistent. The goal of this study was to use three existing datasets to help clarify when gender differences emerge for high school adolescents and for whom (i.e., adolescents across historical time, grade level, and race/ethnicity). Specifically, we examined the extent to which gender differences in adolescents' math value beliefs (i.e., interest, utility, and attainment) replicated (1) across three datasets spanning the 1990s to 2010s, (2) from 9th–12th grade, and (3) within each of the four largest U.S. racial/ethnic groups (i.e., Asian, Black, Latine, and White adolescents). MethodsWe tested these aims with three existing longitudinal U.S. datasets: the California Achievement Motivation Project (CAMP) (n = 8855), the Childhood and Beyond Study (CAB) (n = 582), and the High School Longitudinal Study (HSLS) (n = 21,000). Students were in high school (9th–12th grade) and half were girls (49%–53%). All three datasets included measures with the same or similar math value belief items, making conceptual replication possible. Results and ConclusionsOverall, we did not find strong evidence for meaningful gender differences in adolescents' math value beliefs overall. We did find meaningful gender differences in the oldest data set (CAB). When examined within each racial/ethnic group, we found no evidence of gender differences in math value beliefs among Black or Latine adolescents, but some differences among Asian and White adolescents. The findings align with the gender similarities hypothesis, suggesting adolescent girls and boys had similar math value beliefs. 
    more » « less
  4. Purpose Prior studies show convolutional neural networks predicting self-reported race using x-rays of chest, hand and spine, chest computed tomography, and mammogram. We seek an understanding of the mechanism that reveals race within x-ray images, investigating the possibility that race is not predicted using the physical structure in x-ray images but is embedded in the grayscale pixel intensities. Approach Retrospective full year 2021, 298,827 AP/PA chest x-ray images from 3 academic health centers across the United States and MIMIC-CXR, labeled by self-reported race, were used in this study. The image structure is removed by summing the number of each grayscale value and scaling to percent per image (PPI). The resulting data are tested using multivariate analysis of variance (MANOVA) with Bonferroni multiple-comparison adjustment and class-balanced MANOVA. Machine learning (ML) feed-forward networks (FFN) and decision trees were built to predict race (binary Black or White and binary Black or other) using only grayscale value counts. Stratified analysis by body mass index, age, sex, gender, patient type, make/model of scanner, exposure, and kilovoltage peak setting was run to study the impact of these factors on race prediction following the same methodology. Results MANOVA rejects the null hypothesis that classes are the same with 95% confidence (F 7.38, P < 0.0001) and balanced MANOVA (F 2.02, P < 0.0001). The best FFN performance is limited [area under the receiver operating characteristic (AUROC) of 69.18%]. Gradient boosted trees predict self-reported race using grayscale PPI (AUROC 77.24%). Conclusions Within chest x-rays, pixel intensity value counts alone are statistically significant indicators and enough for ML classification tasks of patient self-reported race. 
    more » « less
  5. Abstract From early in development, race biases how children think about gender—often in a manner that treats Black women as less typical and representative ofwomen in generalthan White or Asian women. The present study (N = 89, ages 7–11; predominately Hispanic, White, and multi‐racial children) examined the generalizability of this phenomenon across middle childhood and the mechanisms underlying variability in its development. Replicating prior work, children were slower and less accurate to categorize the gender of Black women compared to Asian or White women, as well as compared to Black men, suggesting that children perceived Black women as less representative of their gender. These effects were robust across age within a racially and ethnically diverse sample of children. Children's tendencies to view their own racial identities as expansive and flexible, however, attenuated these effects: Children with more flexible racial identities also had gender concepts that were more inclusive of Black women. In contrast, the tendency for race to bias children's gender representations was unrelated to children's multiple classification skill and racial essentialism. These findings shed light on the mechanisms underlying variation in how race biases gender across development, with critical implications for how children's own identities shape the development of intergroup cognition and behavior. 
    more » « less