skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deconstructing demographic bias in speech-based machine learning models for digital health
IntroductionMachine learning (ML) algorithms have been heralded as promising solutions to the realization of assistive systems in digital healthcare, due to their ability to detect fine-grain patterns that are not easily perceived by humans. Yet, ML algorithms have also been critiqued for treating individuals differently based on their demography, thus propagating existing disparities. This paper explores gender and race bias in speech-based ML algorithms that detect behavioral and mental health outcomes. MethodsThis paper examines potential sources of bias in the data used to train the ML, encompassing acoustic features extracted from speech signals and associated labels, as well as in the ML decisions. The paper further examines approaches to reduce existing bias via using the features that are the least informative of one’s demographic information as the ML input, and transforming the feature space in an adversarial manner to diminish the evidence of the demographic information while retaining information about the focal behavioral and mental health state. ResultsResults are presented in two domains, the first pertaining to gender and race bias when estimating levels of anxiety, and the second pertaining to gender bias in detecting depression. Findings indicate the presence of statistically significant differences in both acoustic features and labels among demographic groups, as well as differential ML performance among groups. The statistically significant differences present in the label space are partially preserved in the ML decisions. Although variations in ML performance across demographic groups were noted, results are mixed regarding the models’ ability to accurately estimate healthcare outcomes for the sensitive groups. DiscussionThese findings underscore the necessity for careful and thoughtful design in developing ML models that are capable of maintaining crucial aspects of the data and perform effectively across all populations in digital healthcare applications.  more » « less
Award ID(s):
2430958
PAR ID:
10620575
Author(s) / Creator(s):
; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in Digital Health
Volume:
6
ISSN:
2673-253X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Purpose:This study examined the race identification of Southern American English speakers from two geographically distant regions in North Carolina. The purpose of this work is to explore how talkers' self-identified race, talker dialect region, and acoustic speech variables contribute to listener categorization of talker races. Method:Two groups of listeners heard a series of /h/–vowel–/d/ (/hVd/) words produced by Black and White talkers from East and West North Carolina, respectively. Results:Both Southern (North Carolina) and Midland (Indiana) listeners accurately categorized the race of all speakers with greater-than-chance accuracy; however, Western North Carolina Black talkers were categorized with the lowest accuracy, just above chance. Conclusions:The results suggest that similarities in the speech production patterns of West North Carolina Black and White talkers affect the racial categorization of Black, but not White talkers. The results are discussed with respect to the acoustic spectral features of the voices present in the sample population. 
    more » « less
  2. Given significant concerns about fairness and bias in the use of artificial intelligence (AI) and machine learning (ML) for psychological assessment, we provide a conceptual framework for investigating and mitigating machine-learning measurement bias (MLMB) from a psychometric perspective. MLMB is defined as differential functioning of the trained ML model between subgroups. MLMB manifests empirically when a trained ML model produces different predicted score levels for different subgroups (e.g., race, gender) despite them having the same ground-truth levels for the underlying construct of interest (e.g., personality) and/or when the model yields differential predictive accuracies across the subgroups. Because the development of ML models involves both data and algorithms, both biased data and algorithm-training bias are potential sources of MLMB. Data bias can occur in the form of nonequivalence between subgroups in the ground truth, platform-based construct, behavioral expression, and/or feature computing. Algorithm-training bias can occur when algorithms are developed with nonequivalence in the relation between extracted features and ground truth (i.e., algorithm features are differentially used, weighted, or transformed between subgroups). We explain how these potential sources of bias may manifest during ML model development and share initial ideas for mitigating them, including recognizing that new statistical and algorithmic procedures need to be developed. We also discuss how this framework clarifies MLMB but does not reduce the complexity of the issue. 
    more » « less
  3. Purpose: The equitable distribution of donor kidneys is crucial to maximizing transplant success rates and addressing disparities in healthcare data. This study examines potential gender bias in the Deceased Donor Organ Allocation Model (DDOA) by using machine learning and AI to analyze its impact on kidney discard decisions to ensure fairness in accordance with medical ethics. Methods: The study employs the Deceased Donor Organ Allocation Model (DDOA) model (https://ddoa.mst.hekademeia.org/#/kidney) to predict the discard probability of deceased donor kidneys using donor characteristic from the OPTN Deceased Donor Dataset (2016-2023). Using the SRTR SAF dictionary, the dataset consists of 18,029 donor records, where gender was assessed for its effect on discard probability. ANOVA and t-test determines whether there is a statistically significant difference between the discard percentages for female and male donors by changing the donor gender data alone. If the p-value obtained from the t-test is less than the significance level (typically 0.05), we reject the null hypothesis and conclude that there is a significant difference. Otherwise, we fail to reject the null hypothesis. Results: Figure 1 visualizes the differences in discard percentages between female and male donor kidneys, with an unbiased allocation system expected to show no difference (i.e., a value of zero). To assess the presence of gender bias, statistical analyses, including t-tests and ANOVA were performed. The t-test comparing female and male kidney discard rates yielded a t-statistic of 29.690228, with a p-value of 3.586956e-189 < 0.05 significance threshold. This result leads to the rejection of the null hypothesis, indicating a significant difference was found between the mean when altering only the donor gender attribute in the DDOA model making it play a significant role in discard decisions. Conclusions: The study highlights that a significant difference was found between the mean by altering only the donor gender attribute, contributing to kidney discard rates in the DDOA model. These findings reinforce the need for greater transparency in organ allocation models and a reconsideration of the demographic criteria used in the evaluation process. Future research should refine algorithms to minimize biases in organ allocation and investigate kidney discard disparities in transplantation. 
    more » « less
  4. Solomon, Latasha; Schwartz, Peter J. (Ed.)
    In recent years, computer vision has made significant strides in enabling machines to perform a wide range of tasks, from image classification and segmentation to image generation and video analysis. It is a rapidly evolving field that aims to enable machines to interpret and understand visual information from the environment. One key task in computer vision is image classification, where algorithms identify and categorize objects in images based on their visual features. Image classification has a wide range of applications, from image search and recommendation systems to autonomous driving and medical diagnosis. However, recent research has highlighted the presence of bias in image classification algorithms, particularly with respect to human-sensitive attributes such as gender, race, and ethnicity. Some examples are computer programmers being predicted better in the context of men in images compared to women, and the accuracy of the algorithm being better on greyscale images compared to colored images. This discrepancy in identifying objects is developed through correlation the algorithm learns from the objects in context known as contextual bias. This bias can result in inaccurate decisions, with potential consequences in areas such as hiring, healthcare, and security. In this paper, we conduct an empirical study to investigate bias in the image classification domain based on sensitive attribute gender using deep convolutional neural networks (CNN) through transfer learning and minimize bias within the image context using data augmentation to improve overall model performance. In addition, cross-data generalization experiments are conducted to evaluate model robustness across popular open-source image datasets. 
    more » « less
  5. Purpose:The presented work was invited following the American Speech-Language-Hearing Association SIG 19 Virtual Talk “Speech Science in Diverse Populations” that occurred on September 2, 2021. The purpose of this article was to introduce the historical and theoretical frameworks of gender and race from a decidedly North American (United States) perspective to an audience that may be less familiar with those topics as they relate to the practice of communication sciences and disorders. Race and gender are huge topics. Entire fields of study and lifetimes of work are dedicated to understanding these constructs. Therefore, it is hoped that this brief review of race and gender will prompt the reader to evaluate how the two constructs are used to categorize people and whether being a member of a marginalized or a minoritized group affects the person's access to or use of intervention services. A critical theoretical discussion of race and gender is beyond the scope of this text. In this limited space, this work presents an overview of current and historical discussions of gender and race and a challenge to the reader to accept that their perspective is indebted to a specific belief system. In the United States, that belief system often evaluates human differences into binary categories on a weighted continuum. Speech-language professionals often use that continuum to identify and measure difference into either acceptable variation or disorder. Conclusions:The profession of speech-language pathology was established during a time when variation from middle-class White American communication norms was frequently defined as undesirable and sometimes as disordered. The communities and individuals we encounter deserve to be accepted as they are. We must resolve to expect and accept wide variation in human communication without pathologizing its existence, to expand our thinking about disorder in speech and hearing science, and to accept culturally competent communicators as competent communicators. 
    more » « less