skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2245920

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Radiology report generation, translating radiological images into precise and clinically relevant description, may face the data imbalance challenge — medical tokens appear less frequently than regular tokens, and normal entries are significantly more than abnormal ones. However, very few studies consider the imbalance issues, not even with conjugate imbalance factors. In this study, we propose a Joint Imbalance Adaptation (JIMA) model to promote task robustness by leveraging token and label imbalance. We employ a hard-to-easy learning strategy that mitigates overfitting to frequent labels and tokens, thereby encouraging the model to focus more on infrequent labels and clinical tokens. JIMA presents notable improvements (16.75–50.50% on average) across evaluation metrics on IU X-ray and MIMIC-CXR datasets. Our ablation analysis and human evaluations show the improvements mainly come from enhancing performance on infrequent tokens and abnormal radiological entries, which can also lead to more clinically accurate reports. While data imbalance (e.g., infrequent tokens and abnormal labels) can lead to the underperformance of radiology report generation, our imbalance learning strategy opens promising directions on how to encounter data imbalance by reducing overfitting on frequent patterns and underfitting on infrequent patterns. 
    more » « less
    Free, publicly-accessible full text available June 20, 2026
  2. Data imbalance is a fundamental challenge in ap- plying language models to biomedical applications, particularly in ICD code prediction tasks where label and demographic distributions are uneven. While state-of-the-art language models have been increasingly adopted in biomedical tasks, few studies have systematically examined how data imbalance affects model performance and fairness across demographic groups. This study fills the gap by statistically probing the relationship between data imbalance and model performance in ICD code prediction. We analyze imbalances in a standard benchmark data across gender, age, ethnicity, and social determinants of health by state- of-the-art biomedical language models. By deploying diverse performance metrics and statistical analyses, we explore the influence of data imbalance on performance variations and demographic fairness. Our study shows that data imbalance significantly impacts model performance and fairness, but feature similarity to the majority class may be a more critical factor. We believe this study provides valuable insights for developing more equitable and robust language models in healthcare applications. 
    more » « less
    Free, publicly-accessible full text available June 18, 2026
  3. Objective To determine if natural language processing (NLP) and machine learning (ML) techniques accurately identify interview-based psychological stress and meaning/purpose data in child/adolescent cancer survivors. Materials and Methods Interviews were conducted with 51 survivors (aged 8-17.9 years; ≥5-years post-therapy) from St Jude Children’s Research Hospital. Two content experts coded 244 and 513 semantic units, focusing on attributes of psychological stress (anger, controllability/manageability, fear/anxiety) and attributes of meaning/purpose (goal, optimism, purpose). Content experts extracted specific attributes from the interviews, which were designated as the gold standard. Two NLP/ML methods, Word2Vec with Extreme Gradient Boosting (XGBoost), and Bidirectional Encoder Representations from Transformers Large (BERTLarge), were validated using accuracy, areas under the receiver operating characteristic curves (AUROCC), and under the precision-recall curves (AUPRC). Results BERTLarge demonstrated higher accuracy, AUROCC, and AUPRC in identifying all attributes of psychological stress and meaning/purpose versus Word2Vec/XGBoost. BERTLarge significantly outperformed Word2Vec/XGBoost in characterizing all attributes (P <.05) except for the purpose attribute of meaning/purpose. Discussion These findings suggest that AI tools can help healthcare providers efficiently assess emotional well-being of childhood cancer survivors, supporting future clinical interventions. Conclusions NLP/ML effectively identifies interview-based data for child/adolescent cancer survivors. 
    more » « less
    Free, publicly-accessible full text available March 6, 2026
  4. Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deployment. This study fills the gap by statistically probing relations between language model performance and data shifts across three biomedical tasks. We deploy diverse metrics to evaluate model performance, distance methods to measure data drifts, and statistical methods to quantify temporal effects on biomedical language models. Our study shows that time matters for deploying biomedical language models, while the degree of performance degradation varies by biomedical tasks and statistical quantification approaches. We believe this study can establish a solid benchmark to evaluate and assess temporal effects on deploying biomedical language models. 
    more » « less
    Free, publicly-accessible full text available November 20, 2025
  5. Mortazavi, Bobak J; Sarker, Tasmie; Beam, Andrew; Ho, Joyce C (Ed.)
    Imbalanced token distributions naturally exist in text documents, leading neural language models to overfit on frequent tokens. The token imbalance may dampen the robustness of radiology report generators, as complex medical terms appear less frequently but reflect more medical information. In this study, we demonstrate how current state-of-the-art models fail to generate infrequent tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology report generation. To solve the challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er} (\textit{TIMER}), aiming to improve generation robustness on infrequent tokens. The model automatically leverages token imbalance by an unlikelihood loss and dynamically optimizes generation processes to augment infrequent tokens. We compare our approach with multiple state-of-the-art methods on the two benchmarks. Experiments demonstrate the effectiveness of our approach in enhancing model robustness overall and infrequent tokens. Our ablation analysis shows that our reinforcement learning method has a major effect in adapting token imbalance for radiology report generation. 
    more » « less