NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees

Nie, F; Hou, X; Lin, S; Zou, J; Yao, H; Zhang, L (July 2025, Proceedings of Machine Learning Research)

The propensity of large language models (LLMs) to generate hallucinations and non-factual content undermines their reliability in high-stakes domains, where rigorous control over Type I errors (the conditional probability of incorrectly classifying hallucinations as truthful content) is essential. Despite its importance, formal verification of LLM factuality with such guarantees remains largely unexplored. In this paper, we introduce FACTTEST, a novel framework that statistically assesses whether an LLM can provide correct answers to given questions with high-probability correctness guarantees. We formulate hallucina- tion detection as a hypothesis testing problem to enforce an upper bound of Type I errors at user-specified significance levels. Notably, we prove that FACTTEST also ensures strong Type II error control under mild conditions and can be extended to maintain its effectiveness when covariate shifts exist. FACTTEST is distribution-free and and model-agnostic. It works for any number of human-annotated samples and applies to any black-box or white-box LM. Extensive experiments demonstrate that FACTTEST effectively detects hallucinations and enable LLMs to abstain from answering unknown questions, leading to an over 40% accuracy improvement.
more » « less
Free, publicly-accessible full text available July 17, 2026
Capturing the Temporal Dependence of Training Data Influence

Wang, J; Song, D; Zou, J; Mittal, P; Jia, R (March 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available March 31, 2026
Declarative Privacy-Preserving Inference Queries

Guan, H; Tiwari, A; Gautier, S; Ambrish, RH; Zhou, L; Wang, Y; Gupta, D; Yang, Y; Xiao, C; Chowdhury, K; et al (May 2025, DASFAA 2025)

Free, publicly-accessible full text available May 24, 2026
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Xia, P; Zhu, K; Li, H; Wang, T; Shi, W; Wang, S; Zhang, L; Zou, J; Yao, H (April 2025, ICLR)

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools. However, these models often suffer from factual hallucination, which can lead to incorrect diagnoses. Fine-tuning and retrieval-augmented generation (RAG) have emerged as methods to address these issues. However, the amount of high-quality data and distribution shifts between training data and deployment data limit the application of fine-tuning methods. Although RAG is lightweight and effective, existing RAG-based approaches are not sufficiently general to different medical domains and can potentially cause misalignment issues, both between modalities and between the model and the ground truth. In this paper, we propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs. Our approach introduces a domain-aware retrieval mechanism, an adaptive retrieved contexts selection, and a provable RAG-based preference fine-tuning strategy. These innovations make the RAG process sufficiently general and reliable, significantly improving alignment when introducing retrieved contexts. Experimental results across five medical datasets (involving radiology, ophthalmology, pathology) on medical VQA and report generation demonstrate that MMed-RAG can achieve an average improvement of 43.8% in factual accuracy in the factual accuracy of Med-LVLMs.
more » « less
Free, publicly-accessible full text available April 24, 2026
DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup (DOI: 10.1109/ICDE60146.2024.00008)

Zhou, L; Candan, K; Zou, J (May 2024, Proceedings of 2024 IEEE 39th International Conference on Data Engineering (ICDE 2024))

Storing tabular data to balance storage and query efficiency is a long-standing research question in the database community. In this work, we argue and show that a novel {\em DeepMapping} abstraction, which relies on the impressive {\em memorization} capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time. Such unique properties may benefit a broad class of use cases in capacity-limited devices. Our proposed DeepMapping abstraction transforms a dataset into multiple key-value mappings and constructs a multi-tasking neural network model that outputs the corresponding \textit{values} for a given input \textit{key}. To deal with memorization errors, DeepMapping couples the learned neural network with a lightweight auxiliary data structure capable of correcting mistakes. The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping. We propose a multi-task search strategy for selecting the hybrid DeepMapping structures (including model architecture and auxiliary structure) with a desirable trade-off among memorization capacity, size, and efficiency. Extensive experiments with a real-world dataset, synthetic and benchmark datasets, including TPC-H and TPC-DS, demonstrated that the DeepMapping approach can better balance the retrieving speed and compression ratio against several cutting-edge competitors.
more » « less
Full Text Available
Meta-Learning with Neural Bandit Scheduler

Qi, Y; Ban, Y; Wei, T; Zou, J; Yao, H; He, J (December 2023, NeurIPS 2023)

Full Text Available
Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

Deng, Z; Zhang, L; Ghorbani, A; Zou, J (April 2021, Proceedings of Machine Learning Research)

Full Text Available
How Does Mixup Help With Robustness and Generalization?

Zhang, L; Deng, Z; Kawaguchi, K; Ghorbani, A; Zou, J. (May 2021, International Conference on Learning Representations)

Full Text Available
Tensor Relational Algebra for Distributed Machine Learning System Design

Yuan, B; Jankov, D; Zou, J; Tang, Y; Bourgeois, D; Jermaine, C. (January 2021, Proceedings of the VLDB Endowment)
null (Ed.)
Full Text Available
Making AI Forget You: Data Deletion in Machine Learning

Ginart, A.; Guan, M.; Valiant, G.; Zou, J. (January 2019, Advances in neural information processing systems)

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used — the EU’s Right To Be Forgotten regulation is an example of this effort. In this paper we initiate a framework studying what to do when it is no longer permissible to deploy models derivative from specific user data. In particular, we formulate the problem of efficiently deleting individual data points from trained machine learning models. For many standard ML models, the only way to completely remove an individual’s data is to retrain the whole model from scratch on the remaining data, which is often not computationally practical. We investigate algorithmic principles that enable efficient data deletion in ML. For the specific setting of k-means clustering, we propose two provably efficient deletion algorithms which achieve an average of over 100x improvement in deletion efficiency across 6 datasets, while producing clusters of comparable statistical quality to a canonical k-means++ baseline.
more » « less
Full Text Available

« Prev Next »

Search for: All records