skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2145640

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract ObjectivesThe predictive intensive care unit (ICU) scoring system is crucial for predicting patient outcomes, particularly mortality. Traditional scoring systems rely mainly on structured clinical data from electronic health records, which can overlook important clinical information in narratives and images. Materials and MethodsIn this work, we build a deep learning-based survival prediction model that utilizes multimodality data for ICU mortality prediction. Four sets of features are investigated: (1) physiological measurements of Simplified Acute Physiology Score (SAPS) II, (2) common thorax diseases predefined by radiologists, (3) bidirectional encoder representations from transformers-based text representations, and (4) chest X-ray image features. The model was evaluated using the Medical Information Mart for Intensive Care IV dataset. ResultsOur model achieves an average C-index of 0.7829 (95% CI, 0.7620-0.8038), surpassing the baseline using only SAPS-II features, which had a C-index of 0.7470 (95% CI: 0.7263-0.7676). Ablation studies further demonstrate the contributions of incorporating predefined labels (2.00% improvement), text features (2.44% improvement), and image features (2.82% improvement). Discussion and ConclusionThe deep learning model demonstrated superior performance to traditional machine learning methods under the same feature fusion setting for ICU mortality prediction. This study highlights the potential of integrating multimodal data into deep learning models to enhance the accuracy of ICU mortality prediction. 
    more » « less
  2. Abstract Recent advances in large language models (LLMs) have demonstrated remarkable successes in zero- and few-shot performance on various downstream tasks, paving the way for applications in high-stakes domains. In this study, we systematically examine the capabilities and limitations of LLMs, specifically GPT-3.5 and ChatGPT, in performing zero-shot medical evidence summarization across six clinical domains. We conduct both automatic and human evaluations, covering several dimensions of summary quality. Our study demonstrates that automatic metrics often do not strongly correlate with the quality of summaries. Furthermore, informed by our human evaluations, we define a terminology of error types for medical evidence summarization. Our findings reveal that LLMs could be susceptible to generating factually inconsistent summaries and making overly convincing or uncertain statements, leading to potential harm due to misinformation. Moreover, we find that models struggle to identify the salient information and are more error-prone when summarizing over longer textual contexts. 
    more » « less
  3. Abstract In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has garnered a lot of attention in the medical research community, leading to debates about its application in the healthcare sector, mainly due to concerns about transparency and related issues. Meanwhile, questions around the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied. As a result, there are no clear solutions to address ethical concerns, and decision-makers often neglect to consider the significance of ethical principles before implementing generative AI in clinical practice. In an attempt to address these issues, we explore ethical principles from the military perspective and propose the “GREAT PLEA” ethical principles, namely Governability, Reliability, Equity, Accountability, Traceability, Privacy, Lawfulness, Empathy, and Autonomy for generative AI in healthcare. Furthermore, we introduce a framework for adopting and expanding these ethical principles in a practical way that has been useful in the military and can be applied to healthcare for generative AI, based on contrasting their ethical concerns and risks. Ultimately, we aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI into healthcare practice. 
    more » « less
  4. Free, publicly-accessible full text available February 18, 2026
  5. This study examined the application of GPT-4 with vision (GPT-4V), a multimodal large language model with visual recognition, in detecting radiologic findings from a set of 100 chest radiographs and suggests that GPT-4V is currently not ready for real-world diagnostic usage in interpreting chest radiographs. 
    more » « less