skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Preliminary Studies on a Large Face Database
We perform preliminary studies on a large longitudinal face database MORPH-II, which is a benchmark dataset in the field of computer vision and pattern recognition. First, we summarize the inconsistencies in the dataset and introduce the steps and strategy taken for cleaning. The potential implications of these inconsistencies on prior research are introduced. Next, we propose a new automatic subsetting scheme for evaluation protocol. It is intended to overcome the unbalanced racial and gender distributions of MORPH-II, while ensuring independence between training and testing sets. Finally, we contribute a novel global framework for age estimation that utilizes posterior probabilities from the race classification step to compute a race-composite age estimate. Preliminary experimental results on MORPH-II are presented.  more » « less
Award ID(s):
1659288
PAR ID:
10227444
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2018 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
2572 to 2579
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Existing public face image datasets are strongly biased toward Caucasian faces, and other races (e.g., Latino) are significantly underrepresented. The models trained from such datasets suffer from inconsistent classification accuracy, which limits the applicability of face analytic systems to non-White race groups. To mitigate the race bias problem in these datasets, we constructed a novel face image dataset containing 108,501 images which is balanced on race. We define 7 race groups: White, Black, Indian, East Asian, Southeast Asian, Middle Eastern, and Latino. Images were collected from the YFCC-100M Flickr dataset and labeled with race, gender, and age groups. Evaluations were performed on existing face attribute datasets as well as novel image datasets to measure the generalization performance. We find that the model trained from our dataset is substantially more accurate on novel datasets and the accuracy is consistent across race and gender groups. We also compare several commercial computer vision APIs and report their balanced accuracy across gender, race, and age groups. Our code, data, and models are available at https://github.com/joojs/fairface. 
    more » « less
  2. The proliferation of Artificial Intelligence (AI) has revolutionized the healthcare domain with technological advancements in conventional diagnosis and treatment methods. These advancements lead to faster disease detection, and management and provide personalized healthcare solutions. However, most of the clinical AI methods developed and deployed in hospitals have algorithmic and data-driven biases due to insufficient representation of specific race, gender, and age group which leads to misdiagnosis, disparities, and unfair outcomes. Thus, it is crucial to thoroughly examine these biases and develop computational methods that can mitigate biases effectively. This paper critically analyzes this problem by exploring different types of data and algorithmic biases during both pre-processing and post-processing phases to uncover additional, previously unexplored biases in a widely used real-world healthcare dataset of primary care patients. Additionally, effective strategies are proposed to address gender, race, and age biases, ensuring that risk prediction outcomes are equitable and impartial. Through experiments with various machine learning algorithms leveraging the Fairlearn tool, we have identified biases in the dataset, compared the impact of these biases on the prediction performance, and proposed effective strategies to mitigate these biases. Our results demonstrate clear evidence of racial, gender-based, and age-related biases in the healthcare dataset used to guide resource allocation for patients and have profound impact on the prediction performance which leads to unfair outcomes. Thus, it is crucial to implement mechanisms to detect and address unintended biases to ensure a safe, reliable, and trustworthy AI system in healthcare 
    more » « less
  3. A morph is an image of an ambiguous subject generated by combining multiple individuals. The morphed image can be submitted to a facial recognition system and erroneously verified with the contributing bad actors. When submitted as a passport image, a morphed face poses a national security threat because a passport can then be shared between the individuals. As morphed images become easier to generate, it is vital that the research community expands available datasets in order to contentiously improve current technology. Children are a challenging paradigm for facial recognition systems and morphing children takes advantage of this disparity. In this paper, we morph juvenile faces in order to create a unique, high-quality dataset to challenge FRS. To the best of our knowledge, this is the first study on the generation and evaluation of juvenile morphed faces. The evaluation of the generated morphed juvenile dataset is performed in terms of vulnerability analysis and presentation attack error rates. 
    more » « less
  4. Software system security gets a lot of attention from the industry for its crucial role in protecting private resources. Typically, users access a system’s services via an application programming interface (API). This API must be protected to prevent unauthorized access. One way that developers deal with this challenge is by using role-based access control where each entry point is associated with a set of user roles. However, entry points may use the same methods from lower layers in the application with inconsistent permissions. Currently, developers use integration or penetration testing which demands a lot of effort to test authorization inconsistencies. This paper proposes an automated method to test role-based access control in enterprise applications. Our method verifies inconsistencies within the application using authorization role definitions that are associated with the API entry points. By analyzing the method calls and entity accesses on subsequent layers, inconsistencies across the entire application can be extracted. We demonstrate our solution in a case study and discuss our preliminary results. 
    more » « less
  5. Increasingly popular Robot Operating System (ROS) framework allows building robotic systems by integrating newly developed and/or reused modules, where the modules can use different versions of the framework (e.g., ROS1 or ROS2) and programming language (e.g. C++ or Python). The majority of such robotic systems' work happens in callbacks. The framework provides various elements for initializing callbacks and for setting up the execution of callbacks. It is the responsibility of developers to compose callbacks and their execution setup elements, and hence can lead to inconsistencies related to the setup of callback execution due to developer's incomplete knowledge of the semantics of elements in various versions of the framework. Some of these inconsistencies do not throw errors at runtime, making their detection difficult for developers. We propose a static approach to detecting such inconsistencies by extracting a static view of the composition of robotic system's callbacks and their execution setup, and then checking it against the composition conventions based on the elements' semantics. We evaluate our ROSCallBaX prototype on the dataset created from the posts on developer forums and ROS projects that are publicly available. The evaluation results show that our approach can detect real inconsistencies. 
    more » « less