NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Examining Imbalance Effects on Performance and Demographic Fairness of Clinical Language Models

https://doi.org/10.1109/ICHI64645.2025.00016

Jones, Precious; Liu, Weisi; Huang, I-Chan; Huang, Xiaolei (June 2025, IEEE)

Data imbalance is a fundamental challenge in ap- plying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state- of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications.
more » « less
Free, publicly-accessible full text available June 18, 2026
Joint Imbalance Adaptation for Radiology Report Generation

https://doi.org/10.1007/s41666-025-00205-9

Li, Wang; Han, Guangzeng; Wu, Yuexin; Huang, I-Chan; Huang, Xiaolei (June 2025, Journal of Healthcare Informatics Research)

Radiology report generation, translating radiological images into precise and clinically relevant description, may face the data imbalance challenge — medical tokens appear less frequently than regular tokens, and normal entries are significantly more than abnormal ones. However, very few studies consider the imbalance issues, not even with conjugate imbalance factors. In this study, we propose a Joint Imbalance Adaptation (JIMA) model to promote task robustness by leveraging token and label imbalance. We employ a hard-to-easy learning strategy that mitigates overfitting to frequent labels and tokens, thereby encouraging the model to focus more on infrequent labels and clinical tokens. JIMA presents notable improvements (16.75–50.50% on average) across evaluation metrics on IU X-ray and MIMIC-CXR datasets. Our ablation analysis and human evaluations show the improvements mainly come from enhancing performance on infrequent tokens and abnormal radiological entries, which can also lead to more clinically accurate reports. While data imbalance (e.g., infrequent tokens and abnormal labels) can lead to the underperformance of radiology report generation, our imbalance learning strategy opens promising directions on how to encounter data imbalance by reducing overfitting on frequent patterns and underfitting on infrequent patterns.
more » « less
Free, publicly-accessible full text available June 20, 2026
Time Matters: Examine Temporal Effects on Biomedical Language Models

Liu, Weisi; He, Zhe; Huang, Xiaolei (November 2024, AMIA Annual Symposium Proceedings)

Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deployment. This study fills the gap by statistically probing relations between language model performance and data shifts across three biomedical tasks. We deploy diverse metrics to evaluate model performance, distance methods to measure data drifts, and statistical methods to quantify temporal effects on biomedical language models. Our study shows that time matters for deploying biomedical language models, while the degree of performance degradation varies by biomedical tasks and statistical quantification approaches. We believe this study can establish a solid benchmark to evaluate and assess temporal effects on deploying biomedical language models.
more » « less
Free, publicly-accessible full text available November 20, 2025
Leveraging natural language processing and machine learning to characterize psychological stress and life meaning and purpose in pediatric cancer survivors: a preliminary validation study

https://doi.org/10.1093/jamiaopen/ooaf018

Sim, Jin-ah; Huang, Xiaolei; Webster, Rachel T; Srivastava, Kumar; Ness, Kirsten K; Hudson, Melissa M; Baker, Justin N; Huang, I-Chan (March 2025, JAMIA Open)

Objective To determine if natural language processing (NLP) and machine learning (ML) techniques accurately identify interview-based psychological stress and meaning/purpose data in child/adolescent cancer survivors. Materials and Methods Interviews were conducted with 51 survivors (aged 8-17.9 years; ≥5-years post-therapy) from St Jude Children’s Research Hospital. Two content experts coded 244 and 513 semantic units, focusing on attributes of psychological stress (anger, controllability/manageability, fear/anxiety) and attributes of meaning/purpose (goal, optimism, purpose). Content experts extracted specific attributes from the interviews, which were designated as the gold standard. Two NLP/ML methods, Word2Vec with Extreme Gradient Boosting (XGBoost), and Bidirectional Encoder Representations from Transformers Large (BERTLarge), were validated using accuracy, areas under the receiver operating characteristic curves (AUROCC), and under the precision-recall curves (AUPRC). Results BERTLarge demonstrated higher accuracy, AUROCC, and AUPRC in identifying all attributes of psychological stress and meaning/purpose versus Word2Vec/XGBoost. BERTLarge significantly outperformed Word2Vec/XGBoost in characterizing all attributes (P <.05) except for the purpose attribute of meaning/purpose. Discussion These findings suggest that AI tools can help healthcare providers efficiently assess emotional well-being of childhood cancer survivors, supporting future clinical interventions. Conclusions NLP/ML effectively identifies interview-based data for child/adolescent cancer survivors.
more » « less
Free, publicly-accessible full text available March 6, 2026
Length-Aware Multi-Kernel Transformer for Long Document Classification

https://doi.org/10.18653/v1/2024.starsem-1.22

Han, Guangzeng; Tsao, Jack; Huang, Xiaolei (June 2024, Association for Computational Linguistics)

Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths. For example, our empirical analysis has shown that SOTA models consistently overfit one set of lengthy documents (e.g., 2000 tokens) while performing worse on texts with other lengths (e.g., 1000 or 4000). In this study, we propose a Length-Aware Multi-Kernel Transformer (LAMKIT) to address the new challenges for the long document classification. LAMKIT encodes lengthy documents by diverse transformer-based kernels for bridging context boundaries and vectorizes text length by the kernels to promote model robustness over varying document lengths. Experiments on five standard benchmarks from health and law domains show LAMKIT outperforms SOTA models up to an absolute 10.9% improvement. We conduct extensive ablation analyses to examine model robustness and effectiveness over varying document lengths.
more » « less
Full Text Available
Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts

https://doi.org/10.1109/ICHI61247.2024.00057

Han, Guangzeng; Liu, Weisi; Huang, Xiaolei; Borsari, Brian (June 2024, IEEE)

Automatic coding patient behaviors is essential to support decision making for psychotherapists during the motivational interviewing (MI), a collaborative communication intervention approach to address psychiatric issues, such as alcohol and drug addiction. While the behavior coding task has rapidly adapted language models to predict patient states during the MI sessions, lacking of domain-specific knowledge and overlooking patient-therapist interactions are major challenges in developing and deploying those models in real practice. To encounter those challenges, we introduce the Chain-of- Interaction (CoI) prompting method aiming to contextualize large language models (LLMs) for psychiatric decision support by the dyadic interactions. The CoI prompting approach systematically breaks down the coding task into three key reasoning steps, extract patient engagement, learn therapist question strategies, and integrates dyadic interactions between patients and therapists. This approach enables large language models to leverage the coding scheme, patient state, and domain knowledge for patient behavioral coding. Experiments on real-world datasets can prove the effectiveness and flexibility of our prompting method with multiple state-of-the-art LLMs over existing prompting baselines. We have conducted extensive ablation analysis and demonstrate the critical role of dyadic interactions in applying LLMs for psychotherapy behavior understanding.
more » « less
Full Text Available
Using natural language processing to analyze unstructured patient-reported outcomes data derived from electronic health records for cancer populations: a systematic review

https://doi.org/10.1080/14737167.2024.2322664

Sim, Jin-Ah; Huang, Xiaolei; Horan, Madeline R; Baker, Justin N; Huang, I-Chan (April 2024, Expert Review of Pharmacoeconomics & Outcomes Research)

Full Text Available
Token Imbalance Adaptation for Radiology Report Generation

Wu, Yuexin; Huang, I-Chan; Huang, Xiaolei (August 2023, Proceedings of Machine Learning Research)
Mortazavi, Bobak J; Sarker, Tasmie; Beam, Andrew; Ho, Joyce C (Ed.)
Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. To solve the challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er} (\textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation.
more » « less
Full Text Available
Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review

https://doi.org/10.1016/j.artmed.2023.102701

Sim, Jin-ah; Huang, Xiaolei; Horan, Madeline R; Stewart, Christopher M; Robison, Leslie L; Hudson, Melissa M; Baker, Justin N; Huang, I-Chan (December 2023, Artificial Intelligence in Medicine)

Full Text Available
End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

Liu, Jiachen; Xue, Yuan; Duarte, Jose; Shekhawat, Krishnendra; Zhou, Zihan; Huang, Xiaolei (October 2022, European Conference on Computer Vision)

Full Text Available

« Prev Next »

Search for: All records