NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Using Large Language Models to Promote Health Equity

https://doi.org/10.1056/AIp2400889

Pierson, Emma; Shanmugam, Divya; Movva, Rajiv; Kleinberg, Jon; Agrawal, Monica; Dredze, Mark; Ferryman, Kadija; Gichoya, Judy Wawira; Jurafsky, Dan; Koh, Pang Wei; et al (January 2025, NEJM AI)

Free, publicly-accessible full text available January 23, 2026
The TRIPOD-LLM reporting guideline for studies using large language models

https://doi.org/10.1038/s41591-024-03425-5

Gallifant, Jack; Afshar, Majid; Ameen, Saleem; Aphinyanaphongs, Yindalon; Chen, Shan; Cacciamani, Giovanni; Demner-Fushman, Dina; Dligach, Dmitriy; Daneshjou, Roxana; Fernandes, Chrystinne; et al (January 2025, Nature Medicine)

Full Text Available
MedShift: Automated Identification of Shift Data for Medical Image Dataset Curation

https://doi.org/10.1109/JBHI.2023.3275104

Guo, Xiaoyuan; Gichoya, Judy Wawira; Trivedi, Hari; Purkayastha, Saptarshi; Banerjee, Imon (August 2023, IEEE Journal of Biomedical and Health Informatics)

Automated curation of noisy external data in the medical domain has long been in high demand, as AI technologies need to be validated using various sources with clean, annotated data. Identifying the variance between internal and external sources is a fundamental step in curating a high-quality dataset, as the data distributions from different sources can vary significantly and subsequently affect the performance of AI models. The primary challenges for detecting data shifts are - (1) accessing private data across healthcare institutions for manual detection and (2) the lack of automated approaches to learn efficient shift-data representation without training samples. To overcome these problems, we propose an automated pipeline called MedShift to detect top-level shift samples and evaluate the significance of shift data without sharing data between internal and external organizations. MedShift employs unsupervised anomaly detectors to learn the internal distribution and identify samples showing significant shiftness for external datasets, and then compares their performance. To quantify the effects of detected shift data, we train a multi-class classifier that learns internal domain knowledge and evaluates the classification performance for each class in external domains after dropping the shift data. We also propose a data quality metric to quantify the dissimilarity between internal and external datasets. We verify the efficacy of MedShift using musculoskeletal radiographs (MURA) and chest X-ray datasets from multiple external sources. Our experiments show that our proposed shift data detection pipeline can be beneficial for medical centers to curate high-quality datasets more efficiently.
more » « less
Full Text Available
AI pitfalls and what not to do: mitigating bias in AI

https://doi.org/10.1259/bjr.20230023

Gichoya, Judy Wawira; Thomas, Kaesha; Celi, Leo Anthony; Safdar, Nabile; Banerjee, Imon; Banja, John D; Seyyed-Kalantari, Laleh; Trivedi, Hari; Purkayastha, Saptarshi (October 2023, The British Journal of Radiology)

Various forms of artificial intelligence (AI) applications are being deployed and used in many healthcare systems. As the use of these applications increases, we are learning the failures of these models and how they can perpetuate bias. With these new lessons, we need to prioritize bias evaluation and mitigation for radiology applications; all the while not ignoring the impact of changes in the larger enterprise AI deployment which may have downstream impact on performance of AI models. In this paper, we provide an updated review of known pitfalls causing AI bias and discuss strategies for mitigating these biases within the context of AI deployment in the larger healthcare enterprise. We describe these pitfalls by framing them in the larger AI lifecycle from problem definition, data set selection and curation, model training and deployment emphasizing that bias exists across a spectrum and is a sequela of a combination of both human and machine factors.
more » « less
Full Text Available
Ability of artificial intelligence to identify self-reported race in chest x-ray using pixel intensity counts

https://doi.org/10.1117/1.JMI.10.6.061106

Burns, John Lee; Zaiman, Zachary; Vanschaik, Jack; Luo, Gaoxiang; Peng, Le; Price, Brandon; Mathias, Garric; Mittal, Vijay; Sagane, Akshay; Tignanelli, Christopher; et al (November 2023, Journal of Medical Imaging)

Purpose Prior studies show convolutional neural networks predicting self-reported race using x-rays of chest, hand and spine, chest computed tomography, and mammogram. We seek an understanding of the mechanism that reveals race within x-ray images, investigating the possibility that race is not predicted using the physical structure in x-ray images but is embedded in the grayscale pixel intensities. Approach Retrospective full year 2021, 298,827 AP/PA chest x-ray images from 3 academic health centers across the United States and MIMIC-CXR, labeled by self-reported race, were used in this study. The image structure is removed by summing the number of each grayscale value and scaling to percent per image (PPI). The resulting data are tested using multivariate analysis of variance (MANOVA) with Bonferroni multiple-comparison adjustment and class-balanced MANOVA. Machine learning (ML) feed-forward networks (FFN) and decision trees were built to predict race (binary Black or White and binary Black or other) using only grayscale value counts. Stratified analysis by body mass index, age, sex, gender, patient type, make/model of scanner, exposure, and kilovoltage peak setting was run to study the impact of these factors on race prediction following the same methodology. Results MANOVA rejects the null hypothesis that classes are the same with 95% confidence (F 7.38, P < 0.0001) and balanced MANOVA (F 2.02, P < 0.0001). The best FFN performance is limited [area under the receiver operating characteristic (AUROC) of 69.18%]. Gradient boosted trees predict self-reported race using grayscale PPI (AUROC 77.24%). Conclusions Within chest x-rays, pixel intensity value counts alone are statistically significant indicators and enough for ML classification tasks of patient self-reported race.
more » « less
Full Text Available
Opportunistic detection of type 2 diabetes using deep learning from frontal chest radiographs

https://doi.org/10.1038/s41467-023-39631-x

Pyrros, Ayis; Borstelmann, Stephen M; Mantravadi, Ramana; Zaiman, Zachary; Thomas, Kaesha; Price, Brandon; Greenstein, Eugene; Siddiqui, Nasir; Willis, Melinda; Shulhan, Ihar; et al (December 2023, Nature Communications)

Abstract Deep learning (DL) models can harness electronic health records (EHRs) to predict diseases and extract radiologic findings for diagnosis. With ambulatory chest radiographs (CXRs) frequently ordered, we investigated detecting type 2 diabetes (T2D) by combining radiographic and EHR data using a DL model. Our model, developed from 271,065 CXRs and 160,244 patients, was tested on a prospective dataset of 9,943 CXRs. Here we show the model effectively detected T2D with a ROC AUC of 0.84 and a 16% prevalence. The algorithm flagged 1,381 cases (14%) as suspicious for T2D. External validation at a distinct institution yielded a ROC AUC of 0.77, with 5% of patients subsequently diagnosed with T2D. Explainable AI techniques revealed correlations between specific adiposity measures and high predictivity, suggesting CXRs’ potential for enhanced T2D screening.
more » « less
Full Text Available
Failures Hiding in Success for Artificial Intelligence in Radiology

https://doi.org/10.1016/j.jacr.2020.11.008

Purkayastha, Saptarshi; Trivedi, Hari; Gichoya, Judy Wawira (March 2021, Journal of the American College of Radiology)
null (Ed.)
Full Text Available
A DICOM Framework for Machine Learning and Processing Pipelines Against Real-time Radiology Images

https://doi.org/10.1007/s10278-021-00491-w

Kathiravelu, Pradeeban; Sharma, Puneet; Sharma, Ashish; Banerjee, Imon; Trivedi, Hari; Purkayastha, Saptarshi; Sinha, Priyanshu; Cadrin-Chenevert, Alexandre; Safdar, Nabile; Gichoya, Judy Wawira (August 2021, Journal of Digital Imaging)
null (Ed.)
Abstract Real-time execution of machine learning (ML) pipelines on radiology images is difficult due to limited computing resources in clinical environments, whereas running them in research clusters requires efficient data transfer capabilities. We developed Niffler, an open-source Digital Imaging and Communications in Medicine (DICOM) framework that enables ML and processing pipelines in research clusters by efficiently retrieving images from the hospitals’ PACS and extracting the metadata from the images. We deployed Niffler at our institution (Emory Healthcare, the largest healthcare network in the state of Georgia) and retrieved data from 715 scanners spanning 12 sites, up to 350 GB/day continuously in real-time as a DICOM data stream over the past 2 years. We also used Niffler to retrieve images bulk on-demand based on user-provided filters to facilitate several research projects. This paper presents the architecture and three such use cases of Niffler. First, we executed an IVC filter detection and segmentation pipeline on abdominal radiographs in real-time, which was able to classify 989 test images with an accuracy of 96.0%. Second, we applied the Niffler Metadata Extractor to understand the operational efficiency of individual MRI systems based on calculated metrics. We benchmarked the accuracy of the calculated exam time windows by comparing Niffler against the Clinical Data Warehouse (CDW). Niffler accurately identified the scanners’ examination timeframes and idling times, whereas CDW falsely depicted several exam overlaps due to human errors. Third, with metadata extracted from the images by Niffler, we identified scanners with misconfigured time and reconfigured five scanners. Our evaluations highlight how Niffler enables real-time ML and processing pipelines in a research cluster.
more » « less
Full Text Available
Patient-specific COVID-19 resource utilization prediction using fusion AI model

https://doi.org/10.1038/s41746-021-00461-0

Tariq, Amara; Celi, Leo Anthony; Newsome, Janice M.; Purkayastha, Saptarshi; Bhatia, Neal Kumar; Trivedi, Hari; Gichoya, Judy Wawira; Banerjee, Imon (June 2021, npj Digital Medicine)

Abstract The strain on healthcare resources brought forth by the recent COVID-19 pandemic has highlighted the need for efficient resource planning and allocation through the prediction of future consumption. Machine learning can predict resource utilization such as the need for hospitalization based on past medical data stored in electronic medical records (EMR). We conducted this study on 3194 patients (46% male with mean age 56.7 (±16.8), 56% African American, 7% Hispanic) flagged as COVID-19 positive cases in 12 centers under Emory Healthcare network from February 2020 to September 2020, to assess whether a COVID-19 positive patient’s need for hospitalization can be predicted at the time of RT-PCR test using the EMR data prior to the test. Five main modalities of EMR, i.e., demographics, medication, past medical procedures, comorbidities, and laboratory results, were used as features for predictive modeling, both individually and fused together using late, middle, and early fusion. Models were evaluated in terms of precision, recall, F1-score (within 95% confidence interval). The early fusion model is the most effective predictor with 84% overall F1-score [CI 82.1–86.1]. The predictive performance of the model drops by 6 % when using recent clinical data while omitting the long-term medical history. Feature importance analysis indicates that history of cardiovascular disease, emergency room visits in the past year prior to testing, and demographic factors are predictive of the disease trajectory. We conclude that fusion modeling using medical history and current treatment data can forecast the need for hospitalization for patients infected with COVID-19 at the time of the RT-PCR test.
more » « less
AI recognition of patient race in medical imaging: a modelling study

https://doi.org/10.1016/S2589-7500(22)00063-2

Gichoya, Judy Wawira; Banerjee, Imon; Bhimireddy, Ananth Reddy; Burns, John L; Celi, Leo Anthony; Chen, Li-Ching; Correa, Ramon; Dullerud, Natalie; Ghassemi, Marzyeh; Huang, Shih-Cheng; et al (June 2022, The Lancet Digital Health)

Full Text Available

Search for: All records