skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Generalizability of Machine Learning Classification Algorithms and Their Application to the Framingham Heart Study
The use of machine learning algorithms in healthcare can amplify social injustices and health inequities. While the exacerbation of biases can occur and be compounded during problem selection, data collection, and outcome definition, this research pertains to the generalizability impediments that occur during the development and post-deployment of machine learning classification algorithms. Using the Framingham coronary heart disease data as a case study, we show how to effectively select a probability cutoff to convert a regression model for a dichotomous variable into a classifier. We then compare the sampling distribution of the predictive performance of eight machine learning classification algorithms under four stratified training/testing scenarios to test their generalizability and their potential to perpetuate biases. We show that both extreme gradient boosting and support vector machine are flawed when trained on an unbalanced dataset. We then show that the double discriminant scoring of type 1 and 2 is the most generalizable with respect to the true positive and negative rates, respectively, as it consistently outperforms the other classification algorithms, regardless of the training/testing scenario. Finally, we introduce a methodology to extract an optimal variable hierarchy for a classification algorithm and illustrate it on the overall, male and female Framingham coronary heart disease data.  more » « less
Award ID(s):
2331502
PAR ID:
10524466
Author(s) / Creator(s):
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Information
Volume:
15
Issue:
5
ISSN:
2078-2489
Page Range / eLocation ID:
252
Subject(s) / Keyword(s):
machine learning classification algorithm health disparities variable selection methodology optimal variable hierarchy
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Artificial intelligence nowadays plays an increasingly prominent role in our life since decisions that were once made by humans are now delegated to automated systems. A machine learning algorithm trained based on biased data, however, tends to make unfair predictions. Developing classification algorithms that are fair with respect to protected attributes of the data thus becomes an important problem. Motivated by concerns surrounding the fairness effects of sharing and few-shot machine learning tools, such as the Model Agnostic Meta-Learning [1] framework, we propose a novel fair fast-adapted few-shot meta-learning approach that efficiently mitigates biases during meta train by ensuring controlling the decision boundary covariance that between the protected variable and the signed distance from the feature vectors to the decision boundary. Through extensive experiments on two real-world image benchmarks over three state-of-the-art meta-learning algorithms, we empirically demonstrate that our proposed approach efficiently mitigates biases on model output and generalizes both accuracy and fairness to unseen tasks with a limited amount of training samples. 
    more » « less
  2. Babulal, Ganesh (Ed.)
    Digital voice recordings can offer affordable, accessible ways to evaluate behavior and function. We assessed how combining different low-level voice descriptors can evaluate cognitive status. Using voice recordings from neuropsychological exams at the Framingham Heart Study, we developed a machine learning framework fusing spectral, prosodic, and sound quality measures early in the training cycle. The model’s area under the receiver operating characteristic curve was 0.832 (±0.034) in differentiating persons with dementia from those who had normal cognition. This offers a data-driven framework for analyzing minimally processed voice recordings for cognitive assessment, highlighting the value of digital technologies in disease detection and intervention. 
    more » « less
  3. NA (Ed.)
    Facial attribute classification algorithms frequently manifest demographic biases by obtaining differential performance across gender and racial groups. Existing bias mitigation techniques are mostly in-processing techniques, i.e., implemented during the classifier’s training stage, that often lack generalizability, require demographically annotated training sets, and exhibit a trade-off between fairness and classification accuracy. In this paper, we propose a technique to mitigate bias at the test time i.e., during the deployment stage, by harnessing prediction uncertainty and human–machine partnership. To this front, we propose to utilize those lowest percentages of test data samples identified as outliers with high prediction uncertainty. These identified uncertain samples at test-time are labeled by human analysts for decision rendering and for subsequently retraining the deep neural network in a continual learning framework. With minimal human involvement and through iterative refinement of the network with human guidance at test-time, we seek to enhance the accuracy as well as the fairness of the already deployed facial attribute classification algorithms. Extensive experiments are conducted on gender and smile attribute classification tasks using four publicly available datasets and with gender and race as the protected attributes. The obtained outcomes consistently demonstrate improved accuracy by up to 2% and 5% for the gender and smile attribute classification tasks, respectively, using our proposed approaches. Further, the demographic bias was significantly reduced, outperforming the State-of-the-Art (SOTA) bias mitigation and baseline techniques by up to 55% for both classification tasks. 
    more » « less
  4. When dealing with data from distinct locations, machine learning algorithms tend to demonstrate an implicit preference of some locations over the others, which constitutes biases that sabotage the spatial fairness of the algorithm. This unfairness can easily introduce biases in subsequent decision-making given broad adoptions of learning-based solutions in practice. However, locational biases in AI are largely understudied. To mitigate biases over locations, we propose a locational meta-referee (Meta-Ref) to oversee the few-shot meta-training and meta-testing of a deep neural network. Meta-Ref dynamically adjusts the learning rates for training samples of given locations to advocate a fair performance across locations, through an explicit consideration of locational biases and the characteristics of input data. We present a three-phase training framework to learn both a meta-learning-based predictor and an integrated Meta-Ref that governs the fairness of the model. Once trained with a distribution of spatial tasks, Meta-Ref is applied to samples from new spatial tasks (i.e., regions outside the training area) to promote fairness during the fine-tune step. We carried out experiments with two case studies on crop monitoring and transportation safety, which show Meta-Ref can improve locational fairness while keeping the overall prediction quality at a similar level. 
    more » « less
  5. null (Ed.)
    Abstract Aims Coronary artery calcium (CAC) scoring is an established tool for cardiovascular risk stratification. However, the lack of widespread availability and concerns about radiation exposure have limited the universal clinical utilization of CAC. In this study, we sought to explore whether machine learning (ML) approaches can aid cardiovascular risk stratification by predicting guideline recommended CAC score categories from clinical features and surface electrocardiograms. Methods and results In this substudy of a prospective, multicentre trial, a total of 534 subjects referred for CAC scores and electrocardiographic data were split into 80% training and 20% testing sets. Two binary outcome ML logistic regression models were developed for prediction of CAC scores equal to 0 and ≥400. Both CAC = 0 and CAC ≥400 models yielded values for the area under the curve, sensitivity, specificity, and accuracy of 84%, 92%, 70%, and 75%, and 87%, 91%, 75%, and 81%, respectively. We further tested the CAC ≥400 model to risk stratify a cohort of 87 subjects referred for invasive coronary angiography. Using an intermediate or higher pretest probability (≥15%) to predict CAC ≥400, the model predicted the presence of significant coronary artery stenosis (P = 0.025), the need for revascularization (P < 0.001), notably bypass surgery (P = 0.021), and major adverse cardiovascular events (P = 0.023) during a median follow-up period of 2 years. Conclusion ML techniques can extract information from electrocardiographic data and clinical variables to predict CAC score categories and similarly risk-stratify patients with suspected coronary artery disease. 
    more » « less