Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available July 1, 2026
- 
            Free, publicly-accessible full text available May 1, 2026
- 
            Free, publicly-accessible full text available June 11, 2026
- 
            Abstract Accurate in-hospital length of stay prediction is a vital quality metric for hospital leaders and health policy decision-makers. It assists with decision-making and informs hospital operations involving factors such as patient flow, elective cases, and human resources allocation, while also informing quality of care and risk considerations. The aim of the research reported in this paper is to use survival analysis to model General Internal Medicine (GIM) length of stay, and to use Shapley value to support interpretation of the resulting model. Survival analysis aims to predict the time until a specific event occurs. In our study, we predict the duration from patient admission to discharge to home, i.e., in-hospital length of stay. In addition to discussing the modeling results, we also talk about how survival analysis of hospital length of stay can be used to guide improvements in the efficiency of hospital operations and support the development of quality metrics.more » « less
- 
            Free, publicly-accessible full text available May 1, 2026
- 
            Free, publicly-accessible full text available November 1, 2025
- 
            Free, publicly-accessible full text available January 1, 2026
- 
            We study the problem of fine-tuning a language model (LM) for a target task by optimally using the information from n auxiliary tasks. This problem has broad applications in NLP, such as targeted instruction tuning and data selection in chain-of-thought fine-tuning. The key challenge of this problem is that not all auxiliary tasks are useful to improve the performance of the target task. Thus, choosing the right subset of auxiliary tasks is crucial. Conventional subset selection methods, such as forward & backward selection, are unsuitable for LM fine-tuning because they require repeated training on subsets of auxiliary tasks. This paper introduces a new algorithm to estimate model fine-tuning performances without repeated training. Our algorithm first performs multitask training using the data of all the tasks to obtain a meta initialization. Then, we approximate the model fine-tuning loss of a subset using functional values and gradients from the meta initialization. Empirically, we find that this gradient-based approximation holds with remarkable accuracy for twelve transformer-based LMs. Thus, we can now estimate fine-tuning performances on CPUs within a few seconds. We conduct extensive experiments to validate our approach, delivering a speedup of 30Ă— over conventional subset selection while incurring only 1% error of the true fine-tuning performances. In downstream evaluations of instruction tuning and chain-of-thought fine-tuning, our approach improves over prior methods that utilize gradient or representation similarity for subset selection by up to 3.8%.more » « lessFree, publicly-accessible full text available November 16, 2025
- 
            Verifiable generation requires large language models (LLMs) to cite source documents supporting their outputs, thereby improve output transparency and trustworthiness. Yet, previous work mainly targets the generation of sentencelevel citations, lacking specificity about which part of the sentence is backed by which cited source. This work studies verifiable generation with subsentence-level fine-grained citations to locate the generated content that is supported by the cited sources in a more precise way. We first present a dataset, SCIFI, comprising 10K Wikipedia paragraphs with subsentence-level citations.1 Each paragraph in SCIFI is paired with a set of candidate source documents for citation and a query that triggers the generation of the paragraph content. On SCIFI, we then evaluate the performance of state-of-the-a rt LLMs and strategies for processing long documents designed for these models. Our experiment results reveal key factors that can enhance the quality of citations, including the expansion of the source documents’ context to be accessible to the models and the implementation of specialized model tuning.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available