NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SDoH-GPT: using large language models to extract social determinants of health

https://doi.org/10.1093/jamia/ocaf094

Consoli, Bernardo; Wang, Haoyang; Wu, Xizhi; Wang, Song; Zhao, Xinyu; Wang, Yanshan; Rousseau, Justin; Hartvigsen, Tom; Shen, Li; Wu, Huanmei; et al (June 2025, Journal of the American Medical Informatics Association)

Abstract ObjectiveExtracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability. Materials and MethodsSDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention. ResultsOur approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. DiscussionThis study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost. ConclusionThe key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions.
more » « less
Free, publicly-accessible full text available June 10, 2026
Less Likely Brainstorming: Using Language Models to Generate Alternative Hypotheses

https://doi.org/10.18653/v1/2023.findings-acl.794

Tang, Liyan; Peng, Yifan; Wang, Yanshan; Ding, Ying; Durrett, Greg; Rousseau, Justin (January 2023, Association for Computational Linguistics)

Full Text Available
Attend Who is Weak: Pruning-assisted Medical Image Localization under Sophisticated and Implicit Imbalances

https://doi.org/10.1109/WACV56688.2023.00496

Jaiswal, Ajay; Chen, Tianlong; Rousseau, Justin F.; Peng, Yifan; Ding, Ying; Wang, Zhangyang (January 2023, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

Full Text Available
RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging

https://doi.org/10.1109/ICDM54844.2022.00118

Jaiswal, Ajay; Ashutosh, Kumar; Rousseau, Justin F.; Peng, Yifan; Wang, Zhangyang; Ding, Ying (November 2022, 2022 IEEE International Conference on Data Mining (ICDM))

Full Text Available
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

https://doi.org/10.18653/v1/2023.acl-long.650

Tang, Liyan; Goyal, Tanya; Fabbri, Alex; Laban, Philippe; Xu, Jiacheng; Yavuz, Semih; Kryscinski, Wojciech; Rousseau, Justin; Durrett, Greg (January 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems’ outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.
more » « less
Full Text Available
Trustworthy assertion classification through prompting

https://doi.org/10.1016/j.jbi.2022.104139

Wang, Song; Tang, Liyan; Majety, Akash; Rousseau, Justin F.; Shih, George; Ding, Ying; Peng, Yifan (August 2022, Journal of Biomedical Informatics)

Full Text Available
Evaluating large language models on medical evidence summarization

https://doi.org/10.1038/s41746-023-00896-7

Tang, Liyan; Sun, Zhaoyi; Idnay, Betina; Nestor, Jordan G.; Soroush, Ali; Elias, Pierre A.; Xu, Ziyang; Ding, Ying; Durrett, Greg; Rousseau, Justin F.; et al (August 2023, npj Digital Medicine)

Abstract Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study demonstrates that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts.
more » « less
SCALP - Supervised Contrastive Learning for Cardiopulmonary Disease Classification and Localization in Chest X-rays using Patient Metadata

https://doi.org/10.1109/icdm51629.2021.00134

Jaiswal, Ajay; Li, Tianhao; Zander, Cyprian; Han, Yan; Rousseau, Justin F.; Peng, Yifan; Ding, Ying (December 2021, IEEE Conference Proceedings)

Full Text Available

Search for: All records