skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Improving model fairness in image-based computer-aided diagnosis
Abstract Deep learning has become a popular tool for computer-aided diagnosis using medical images, sometimes matching or exceeding the performance of clinicians. However, these models can also reflect and amplify human bias, potentially resulting inaccurate missed diagnoses. Despite this concern, the problem of improving model fairness in medical image classification by deep learning has yet to be fully studied. To address this issue, we propose an algorithm that leverages the marginal pairwise equal opportunity to reduce bias in medical image classification. Our evaluations across four tasks using four independent large-scale cohorts demonstrate that our proposed algorithm not only improves fairness in individual and intersectional subgroups but also maintains overall performance. Specifically, the relative change in pairwise fairness difference between our proposed model and the baseline model was reduced by over 35%, while the relative change in AUC value was typically within 1%. By reducing the bias generated by deep learning models, our proposed approach can potentially alleviate concerns about the fairness and reliability of image-based computer-aided diagnosis.  more » « less
Award ID(s):
2019844 2212176
PAR ID:
10503125
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Nature Communications
Date Published:
Journal Name:
Nature Communications
Volume:
14
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Solomon, Latasha; Schwartz, Peter J. (Ed.)
    In recent years, computer vision has made significant strides in enabling machines to perform a wide range of tasks, from image classification and segmentation to image generation and video analysis. It is a rapidly evolving field that aims to enable machines to interpret and understand visual information from the environment. One key task in computer vision is image classification, where algorithms identify and categorize objects in images based on their visual features. Image classification has a wide range of applications, from image search and recommendation systems to autonomous driving and medical diagnosis. However, recent research has highlighted the presence of bias in image classification algorithms, particularly with respect to human-sensitive attributes such as gender, race, and ethnicity. Some examples are computer programmers being predicted better in the context of men in images compared to women, and the accuracy of the algorithm being better on greyscale images compared to colored images. This discrepancy in identifying objects is developed through correlation the algorithm learns from the objects in context known as contextual bias. This bias can result in inaccurate decisions, with potential consequences in areas such as hiring, healthcare, and security. In this paper, we conduct an empirical study to investigate bias in the image classification domain based on sensitive attribute gender using deep convolutional neural networks (CNN) through transfer learning and minimize bias within the image context using data augmentation to improve overall model performance. In addition, cross-data generalization experiments are conducted to evaluate model robustness across popular open-source image datasets. 
    more » « less
  2. Purpose: Limited studies exploring concrete methods or approaches to tackle and enhance model fairness in the radiology domain. Our proposed AI model utilizes supervised contrastive learning to minimize bias in CXR diagnosis. Materials and Methods: In this retrospective study, we evaluated our proposed method on two datasets: the Medical Imaging and Data Resource Center (MIDRC) dataset with 77,887 CXR images from 27,796 patients collected as of April 20, 2023 for COVID-19 diagnosis, and the NIH Chest X-ray (NIH-CXR) dataset with 112,120 CXR images from 30,805 patients collected between 1992 and 2015. In the NIH-CXR dataset, thoracic abnormalities include atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, fibrosis, pleural thickening, or hernia. Our proposed method utilizes supervised contrastive learning with carefully selected positive and negative samples to generate fair image embeddings, which are fine-tuned for subsequent tasks to reduce bias in chest X-ray (CXR) diagnosis. We evaluated the methods using the marginal AUC difference (δ mAUC). Results: The proposed model showed a significant decrease in bias across all subgroups when compared to the baseline models, as evidenced by a paired T-test (p<0.0001). The δ mAUC obtained by our method were 0.0116 (95\% CI, 0.0110-0.0123), 0.2102 (95% CI, 0.2087-0.2118), and 0.1000 (95\% CI, 0.0988-0.1011) for sex, race, and age on MIDRC, and 0.0090 (95\% CI, 0.0082-0.0097) for sex and 0.0512 (95% CI, 0.0512-0.0532) for age on NIH-CXR, respectively. Conclusion: Employing supervised contrastive learning can mitigate bias in CXR diagnosis, addressing concerns of fairness and reliability in deep learning-based diagnostic methods. 
    more » « less
  3. Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort which contains 160 hematoxylin and eosin whole slide images of primary melanoma (86) and nevi (74). We randomly assigned 80% (134) as a training set and built an in-house deep learning method to allow for classification, at the slide level, of nevi and melanoma. The proposed method performed well on the other 20% (26) test dataset; the accuracy of the slide classification task was 92.3% and our model also performed well in terms of predicting the region of interest annotated by the pathologists, showing excellent performance of our model on melanocytic skin tumors. Even though we tested the experiments on a skin tumor dataset, our work could also be extended to other medical image detection problems to benefit the clinical evaluation and diagnosis of different tumors. 
    more » « less
  4. While active efforts are advancing medical artificial intelligence (AI) model development and clinical translation, safety issues of the AI models emerge, but little research has been done. We perform a study to investigate the behaviors of an AI diagnosis model under adversarial images generated by Generative Adversarial Network (GAN) models and to evaluate the effects on human experts when visually identifying potential adversarial images. Our GAN model makes intentional modifications to the diagnosis-sensitive contents of mammogram images in deep learning-based computer-aided diagnosis (CAD) of breast cancer. In our experiments the adversarial samples fool the AI-CAD model to output a wrong diagnosis on 69.1% of the cases that are initially correctly classified by the AI-CAD model. Five breast imaging radiologists visually identify 29%-71% of the adversarial samples. Our study suggests an imperative need for continuing research on medical AI model’s safety issues and for developing potential defensive solutions against adversarial attacks. 
    more » « less
  5. We introduce an active, semisupervised algorithm that utilizes Bayesian experimental design to address the shortage of annotated images required to train and validate Artificial Intelligence (AI) models for lung cancer screening with computed tomography (CT) scans. Our approach incorporates active learning with semisupervised expectation maximization to emulate the human in the loop for additional ground truth labels to train, evaluate, and update the neural network models. Bayesian experimental design is used to intelligently identify which unlabeled samples need ground truth labels to enhance the model’s performance. We evaluate the proposed Active Semi-supervised Expectation Maximization for Computer aided diagnosis (CAD) tasks (ASEM-CAD) using three public CT scans datasets: the National Lung Screening Trial (NLST), the Lung Image Database Consortium (LIDC), and Kaggle Data Science Bowl 2017 for lung cancer classification using CT scans. ASEM-CAD can accurately classify suspicious lung nodules and lung cancer cases with an area under the curve (AUC) of 0.94 (Kaggle), 0.95 (NLST), and 0.88 (LIDC) with significantly fewer labeled images compared to a fully supervised model. This study addresses one of the significant challenges in early lung cancer screenings using low-dose computed tomography (LDCT) scans and is a valuable contribution towards the development and validation of deep learning algorithms for lung cancer screening and other diagnostic radiology examinations. 
    more » « less