skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: Statistical Methods for Assessing Differences in False Non-Match Rates Across Demographic Groups
Biometric recognition is used across a variety of applications from cyber security to border security. Recent research has focused on ensuring biometric performance (false negatives and false positives) is fair across demographic groups. While there has been significant progress on the development of metrics, the evaluation of the performance across groups, and the mitigation of any problems, there has been little work incorporating statistical variation. This is important because differences among groups can be found by chance when no difference is present. In statistics this is called a Type I error. Differences among groups may be due to sampling variation or they may be due to actual difference in system performance. Discriminating between these two sources of error is essential for good decision making about fairness and equity. This paper presents two novel statistical approaches for assessing fairness across demographic groups. The first methodology is a bootstrapped-based hypothesis test, while the second is simpler test methodology focused upon non-statistical audience. For the latter we present the results of a simulation study about the relationship between the margin of error and factors such as number of subjects, number of attempts, correlation between attempts, underlying false non-match rates(FNMR's), and number of groups.  more » « less
Award ID(s):
1650503
NSF-PAR ID:
10395558
Author(s) / Creator(s):
Date Published:
Journal Name:
2022 International Conference on Pattern Recognition (ICPR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat individuals within the same demographic group unfairly. To address this issue, this article introduces a formal definition of within-group fairness that maintains fairness among individuals from within the same group. A pre-processing framework is proposed to meet both inter- and within-group fairness criteria with little compromise in performance. The framework maps the feature vectors of members from different groups to an inter-group fair canonical domain before feeding them into a scoring function. The mapping is constructed to preserve the relative relationship between the scores obtained from the unprocessed feature vectors of individuals from the same demographic group, guaranteeing within-group fairness. This framework has been applied to the Adult, COMPAS risk assessment, and Law School datasets, and its performance is demonstrated and compared with two regularization-based methods in achieving inter-group and within-group fairness. 
    more » « less
  2. Pollard, Tom J. (Ed.)
    Modern predictive models require large amounts of data for training and evaluation, absence of which may result in models that are specific to certain locations, populations in them and clinical practices. Yet, best practices for clinical risk prediction models have not yet considered such challenges to generalizability. Here we ask whether population- and group-level performance of mortality prediction models vary significantly when applied to hospitals or geographies different from the ones in which they are developed. Further, what characteristics of the datasets explain the performance variation? In this multi-center cross-sectional study, we analyzed electronic health records from 179 hospitals across the US with 70,126 hospitalizations from 2014 to 2015. Generalization gap, defined as difference between model performance metrics across hospitals, is computed for area under the receiver operating characteristic curve (AUC) and calibration slope. To assess model performance by the race variable, we report differences in false negative rates across groups. Data were also analyzed using a causal discovery algorithm “Fast Causal Inference” that infers paths of causal influence while identifying potential influences associated with unmeasured variables. When transferring models across hospitals, AUC at the test hospital ranged from 0.777 to 0.832 (1st-3rd quartile or IQR; median 0.801); calibration slope from 0.725 to 0.983 (IQR; median 0.853); and disparity in false negative rates from 0.046 to 0.168 (IQR; median 0.092). Distribution of all variable types (demography, vitals, and labs) differed significantly across hospitals and regions. The race variable also mediated differences in the relationship between clinical variables and mortality, by hospital/region. In conclusion, group-level performance should be assessed during generalizability checks to identify potential harms to the groups. Moreover, for developing methods to improve model performance in new environments, a better understanding and documentation of provenance of data and health processes are needed to identify and mitigate sources of variation. 
    more » « less
  3. null (Ed.)
    Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III —— the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes. 
    more » « less
  4. Recently, there has been a growing interest in developing machine learning (ML) models that can promote fairness, i.e., eliminating biased predictions towards certain populations (e.g., individuals from a specific demographic group). Most existing works learn such models based on well-designed fairness constraints in optimization. Nevertheless, in many practical ML tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance. This is because existing fairness constraints are designed to restrict the prediction disparity among different sensitive groups, but with few samples, it becomes difficult to accurately measure the disparity, thus rendering ineffective fairness optimization. In this paper, we define the fairness-aware learning task with limited training samples as the fair few-shot learning problem. To deal with this problem, we devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks. To compensate for insufficient training samples, we propose an essential strategy to select and leverage an auxiliary set for each meta-test task. These auxiliary sets contain several labeled training samples that can enhance the model performance regarding fairness in meta-test tasks, thereby allowing for the transfer of learned useful fairness-oriented knowledge to meta-test tasks. Furthermore, we conduct extensive experiments on three real-world datasets to validate the superiority of our framework against the state-of-the-art baselines. 
    more » « less
  5. Abstract

    Biometric locking systems offer a seamless integration of an individual's physiological characteristics with secure authentication. However, they suffer from limitations such as false positive and negative authentication, environmental interference, and varying disadvantages across multiple authentication methods. To address these limitations, this study develops a soft smart biopatch for a continuous cardiac biometric wearable device that can continuously gather novel biometric data from an individual's heart sound for authentication with minimal error (less than 0.5%). The device is designed to be discreet and user‐friendly, and it employs soft biocompatible materials to ensure comfort and ease of use. The patch system incorporates a miniaturized microphone to monitor sounds over long periods and multiple dimensions, enhancing the reliability of the biometric data. Furthermore, the use of machine‐learning algorithms has enabled the creation of unique identification keys for individuals based on the continuous monitoring properties of the low‐cost device. These advantages make it more effective and efficient than traditional biometric systems, with the potential to enhance the security of mobile devices and door locks.

     
    more » « less