skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Wang, Lu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Verifiable generation requires large language models (LLMs) to cite source documents supporting their outputs, thereby improve output transparency and trustworthiness. Yet, previous work mainly targets the generation of sentencelevel citations, lacking specificity about which part of the sentence is backed by which cited source. This work studies verifiable generation with subsentence-level fine-grained citations to locate the generated content that is supported by the cited sources in a more precise way. We first present a dataset, SCIFI, comprising 10K Wikipedia paragraphs with subsentence-level citations.1 Each paragraph in SCIFI is paired with a set of candidate source documents for citation and a query that triggers the generation of the paragraph content. On SCIFI, we then evaluate the performance of state-of-the-a rt LLMs and strategies for processing long documents designed for these models. Our experiment results reveal key factors that can enhance the quality of citations, including the expansion of the source documents’ context to be accessible to the models and the implementation of specialized model tuning. 
    more » « less
    Free, publicly-accessible full text available August 12, 2025
  2. We study the task of conducting structured reasoning as generating a reasoning graph from natural language input using large language models (LLMs). Previous approaches have explored various prompting schemes, yet they suffer from error propagation due to the autoregressive nature and single-pass-based decoding, which lack error correction capability. Additionally, relying solely on a single sample may result in the omission of true nodes and edges. To counter this, we draw inspiration from self-consistency (SC), which involves sampling a diverse set of reasoning chains and taking the majority vote as the final answer. To tackle the substantial challenge of applying SC on generated graphs, we propose MIDGARD (MInimum Description length Guided Aggregation of Reasoning in Directed acyclic graph) that leverages Minimum Description Length (MDL)-based formulation to identify consistent properties among the different graph samples generated by an LLM. This formulation helps reject properties that appear in only a few samples, which are likely to be erroneous, while enabling the inclusion of missing elements without compromising precision. Our method demonstrates superior performance than comparisons across various structured reasoning tasks, including argument structure extraction, explanation graph generation, inferring dependency relations among actions for everyday tasks, and semantic graph generation from natural texts. 
    more » « less
    Free, publicly-accessible full text available August 12, 2025
  3. Long document summarization systems are critical for domains with lengthy and jargonladen text, yet they present significant challenges to researchers and developers with limited computing resources. Existing solutions mainly focus on efficient attentions or divideand- conquer strategies. The former reduces theoretical time complexity, but is still memoryheavy. The latter methods sacrifice global context, leading to uninformative and incoherent summaries. This work aims to leverage the memory-efficient nature of divide-and-conquer methods while preserving global context. Concretely, our framework AWESOME uses two novel mechanisms: (1) External memory mechanisms track previously encoded document segments and their corresponding summaries, to enhance global document understanding and summary coherence. (2) Global salient content is further identified beforehand to augment each document segment to support its summarization. Extensive experiments on diverse genres of text, including government reports, meeting transcripts, screenplays, scientific papers, and novels, show that AWESOME produces summaries with improved informativeness, faithfulness, and coherence than competitive baselines on longer documents, while having a smaller GPU memory footprint. 
    more » « less
    Free, publicly-accessible full text available June 17, 2025
  4. Background and Objectives: Sepsis is a leading cause of mortality in intensive care units (ICUs). The development of a robust prognostic model utilizing patients’ clinical data could significantly enhance clinicians’ ability to make informed treatment decisions, potentially improving outcomes for septic patients. This study aims to create a novel machine-learning framework for constructing prognostic tools capable of predicting patient survival or mortality outcome. Methods: A novel dataset is created using concatenated triples of static data, temporal data, and clinical outcomes to expand data size. This structured input trains five machine learning classifiers (KNN, Logistic Regression, SVM, RF, and XGBoost) with advanced feature engineering. Models are evaluated on an independent cohort using AUROC and a new metric, 𝛾, which incorporates the F1 score, to assess discriminative power and generalizability. Results: We developed five prognostic models using the concatenated triple dataset with 10 dynamic features from patient medical records. Our analysis shows that the Extreme Gradient Boosting (XGBoost) model (AUROC = 0.777, F1 score = 0.694) and the Random Forest (RF) model (AUROC = 0.769, F1 score = 0.647), when paired with an ensemble under-sampling strategy, outperform other models. The RF model improves AUROC by 6.66% and reduces overfitting by 54.96%, while the XGBoost model shows a 0.52% increase in AUROC and a 77.72% reduction in overfitting. These results highlight our framework’s ability to enhance predictive accuracy and generalizability, particularly in sepsis prognosis. Conclusion: This study presents a novel modeling framework for predicting treatment outcomes in septic patients, designed for small, imbalanced, and high-dimensional datasets. By using temporal feature encoding, advanced sampling, and dimension reduction techniques, our approach enhances standard classifier performance. The resulting models show improved accuracy with limited data, offering valuable prognostic tools for sepsis management. This framework demonstrates the potential of machine learning in small medical datasets. 
    more » « less
    Free, publicly-accessible full text available October 9, 2025
  5. We investigate pre-training techniques for abstractive multi-document summarization (MDS), which is much less studied than summarizing single documents. Though recent work has demonstrated the effectiveness of highlighting information salience for pretraining strategy design, they struggle to generate abstractive and reflective summaries, which are critical properties for MDS. To this end, we present PELMS, a pre-trained model that uses pre-training objectives based on semantic coherence heuristics and faithfulness constraints together with unlabeled multi-document inputs, to promote the generation of concise, fluent, and faithful summaries. To support the training of PELMS, we compile MultiPT, a multidocument pre-training corpus containing over 93 million documents to form more than 3 million unlabeled topic-centric document clusters, covering diverse genres such as product reviews, news, and general knowledge. We perform extensive evaluation of PELMS in lowshot settings on a wide range of MDS datasets. Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness, and with minimal fine-tuning can match performance of language models at a much larger scale (e.g., GPT-4). 
    more » « less
    Free, publicly-accessible full text available June 1, 2025
  6. The compound 2-(((trifluoromethyl)sulfonyl)oxy)propane-1,3-diyl bis(4-methylben-zenesulfonate) (TPB) is a crucial intermediate in the synthesis of 18F radiolabeled cromolyn derivatives. In this work, we combine 1H NMR spectroscopy, X-ray crystallography, ab initio molecular dynamics and NMR calculations to examine the structure, interactions and solvation dynamics of the TPB molecule. In CDCl3, the -CH2 groups within its glyceryl-derived linker exhibit a single set of proton signals in the 1H NMR measurements. However, when TPB is dissolved in DMSO-d6, distinct splitting patterns emerge despite its seemingly symmetric chemical structure. Crystallographic analysis further unveils the absence of overall symmetry in its three-dimensional arrangement. To elucidate these unique NMR features, we carry out ab initio molecular dynamics simulations and characterize the solvation structures and dynamics of TPB in CHCl3 and DMSO solutions. In contrast to the predominantly non-polar nature of the CHCl3 solvents, DMSO directly participates in C-H···O hydrogen bonding interactions with the solute molecule, leading to the splitting of its -CH2 chemical shifts into two distinct distributions. The comprehensive understanding of the structure and solvation interactions of TPB provides essential insights for its application in the radiofluorination reactions of cromolyn derivatives and holds promise for the future development of radiolabeled dimeric drugs. 
    more » « less
    Free, publicly-accessible full text available June 6, 2025
  7. A model is considered well-calibrated when its probability estimate aligns with the actual likelihood of the output being correct. Calibrating language models (LMs) is crucial, as it plays a vital role in detecting and mitigating hallucinations of LMs as well as building more trustworthy models. However, standard calibration techniques may not be suited for LM calibration. For instance, post-processing methods such as temperature scaling do not reorder the candidate generations. On the other hand, training-based methods require fine-tuning the entire model, which is impractical for LMs of large scale. We present LITCAB, a lightweight calibration mechanism consisting of a single linear layer that takes the input text representation and predicts a bias term, which is then added to the LM output logits. LITCAB improves model calibration by only adding < 2% of the original model parameters. For evaluation, we construct CAT, a benchmark consisting of eight text generation tasks, covering responses ranging from short phrases to paragraphs. We test LITCAB with Llama2-7B, where it improves calibration across all tasks, reducing the average ECE score by as large as 30%. We further conduct a comprehensive evaluation with multiple popular open-sourced LMs from GPT and LLaMA families, yielding the following key findings: (i) Larger models within the same family exhibit better calibration on tasks with short generation tasks, but not necessarily for longer ones. (ii) GPT-family models show superior calibration compared to LLaMA, Llama2, and Vicuna models, despite having much fewer parameters. (iii) Fine-tuning pretrained model (e.g., LLaMA) with samples of limited purpose (e.g., conversations) may lead to worse calibration, highlighting the importance of fine-tuning setups for calibrating LMs. 
    more » « less
    Free, publicly-accessible full text available May 7, 2025
  8. Abstract The active-particle number density is a key parameter for plasma material processing, space propulsion, and plasma-assisted combustion. The traditional actinometry method focuses on measuring the density of the atoms in the ground state, but there is a lack of an effective optical emission spectroscopy method to measure intra-shell excited-state densities. The latter atoms have chemical selectivity and higher energy, and they can easily change the material morphology as well as the ionization and combustion paths. In this work, we present a novel state-resolved actinometry (SRA) method, supported by a krypton line-ratio method for the electron temperature and density, to measure the number densities of nitrogen atoms in the ground and intra-shell excited states. The SRA method is based on a collisional-radiative model, considering the kinetics of atomic nitrogen and krypton including their excited states. The densities measured by our method are compared with those obtained from a dissociative model in a miniature electron cyclotron resonance (ECR) plasma source. Furthermore, the saturation effect, in which the electron density remains constant due to the microwave propagation in an ECR plasma once the power reaches a certain value, is used to verify the electron density measured by the line-ratio method. An ionization balance model is also presented to examine the measured electron temperature. All the values obtained with the different methods are in good agreement with each other, and hence a set of verified rate coefficient data used in our method can be provided. A novel concept, the ‘excited-state system’, is presented to quickly build an optical diagnostic method based on the analysis of quantum number propensity and selection rules. 
    more » « less
    Free, publicly-accessible full text available May 1, 2025
  9. Free, publicly-accessible full text available June 1, 2025