SDoH-GPT: using large language models to extract social determinants of health

Consoli, Bernardo; Wang, Haoyang; Wu, Xizhi; Wang, Song; Zhao, Xinyu; Wang, Yanshan; Rousseau, Justin; Hartvigsen, Tom; Shen, Li; Wu, Huanmei; Peng, Yifan; Long, Qi; Chen, Tianlong; Ding, Ying

doi:10.1093/jamia/ocaf094

Citation Details

This content will become publicly available on June 10, 2026

SDoH-GPT: using large language models to extract social determinants of health

Abstract ObjectiveExtracting social determinants of health (SDoHs) from medical notes depends heavily on labor-intensive annotations, which are typically task-specific, hampering reusability and limiting sharing. Here, we introduce SDoH-GPT, a novel framework leveraging few-shot learning large language models (LLMs) to automate the extraction of SDoH from unstructured text, aiming to improve both efficiency and generalizability. Materials and MethodsSDoH-GPT is a framework including the few-shot learning LLM methods to extract the SDoH from medical notes and the XGBoost classifiers which continue to classify SDoH using the annotations generated by the few-shot learning LLM methods as training datasets. The unique combination of the few-shot learning LLM methods with XGBoost utilizes the strength of LLMs as great few shot learners and the efficiency of XGBoost when the training dataset is sufficient. Therefore, SDoH-GPT can extract SDoH without relying on extensive medical annotations or costly human intervention. ResultsOur approach achieved tenfold and twentyfold reductions in time and cost, respectively, and superior consistency with human annotators measured by Cohen's kappa of up to 0.92. The innovative combination of LLM and XGBoost can ensure high accuracy and computational efficiency while consistently maintaining 0.90+ AUROC scores. DiscussionThis study has verified SDoH-GPT on three datasets and highlights the potential of leveraging LLM and XGBoost to revolutionize medical note classification, demonstrating its capability to achieve highly accurate classifications with significantly reduced time and cost. ConclusionThe key contribution of this study is the integration of LLM with XGBoost, which enables cost-effective and high quality annotations of SDoH. This research sets the stage for SDoH can be more accessible, scalable, and impactful in driving future healthcare solutions. more »

Award ID(s):: 2505865 2333703

PAR ID:: 10631913

Author(s) / Creator(s):: Consoli, Bernardo; Wang, Haoyang; Wu, Xizhi; Wang, Song; Zhao, Xinyu; Wang, Yanshan; Rousseau, Justin; Hartvigsen, Tom; Shen, Li; Wu, Huanmei; Peng, Yifan; Long, Qi; Chen, Tianlong; Ding, Ying

Publisher / Repository:: 10.1093/jamia/ocaf094

Date Published:: 2025-06-10

Journal Name:: Journal of the American Medical Informatics Association

ISSN:: 1067-5027

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 10, 2026
Journal Article:
https://doi.org/10.1093/jamia/ocaf094

More Like this