skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 1, 2026

Title: Real-time Factuality Assessment from Adversarial Feedback
We show that existing evaluations for assessing the factuality of news from conventional sources, such as claims on fact-checking websites, result in high accuracies over time for LLM-based detectors—even after their knowledge cutoffs. This suggests that recent popular false information from such sources can be easily identified due to its likely presence in pre-training/retrieval corpora or the emergence of salient, yet shallow, patterns in these datasets. Instead, we argue that a proper factuality evaluation dataset should test a model’s ability to reason about current events by retrieving and reading related evidence. To this end, we develop a novel pipeline that leverages natural language feedback from a RAG-based detector to iteratively modify real-time news into deceptive variants that challenge LLMs. Our iterative rewrite decreases the binary classification ROC-AUC by an absolute 17.5 percent for a strong RAG-based GPT-4o detector. Our experiments reveal the important role of RAG in both evaluating and generating challenging news examples, as retrieval-free LLM detectors are vulnerable to unseen events and adversarial attacks, while feedback from RAG-based evaluation helps discover more deceitful patterns.  more » « less
Award ID(s):
2211526
PAR ID:
10623898
Author(s) / Creator(s):
; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Format(s):
Medium: X
Location:
Vienna, Austria
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents, each with a distinct task, backbone large language model (LLM), and RAG strategy. We introduce an iterative approach where the search engine generates retrieval results for the RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase. This feedback is then used to iteratively optimize the search engine using an expectation-maximization algorithm, with the goal of maximizing each agent's utility function. Additionally, we adapt this to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback to better serve the results for each of them. Experiments on datasets from the Knowledge-Intensive Language Tasks (KILT) benchmark demonstrates that our approach significantly on average outperforms baselines across 18 RAG models. We demonstrate that our method effectively ''personalizes'' the retrieval for each RAG agent based on the collected feedback. Finally, we provide a comprehensive ablation study to explore various aspects of our method. 
    more » « less
  2. Retrieval Augmented Generation (RAG) has been a recent improvement in providing recent and accurate data to Large Language Models (LLMs). Although RAG has been successful in reducing hallucinations within LLMs, it remains susceptible to inaccurate and maliciously manipulated data. In this paper, we present Distributed-RAG (D-RAG), a novel blockchain-based framework designed to increase the integrity of the RAG system. D-RAG addresses the risks of malicious data by replacing the RAG’s traditionally centralized database with communities, each consisting of a database and a permissioned blockchain. The communities are based on different subjects, each containing experts in the field who verify data through a privacy-preserving consensus protocol before it is added to the database. A Retrieval Blockchain is also designed to communicate between the multiple communities. The miners on this Retrieval Blockchain are responsible for retrieving documents from the database for each query and ranking them using an LLM. These rankings are agreed upon, and the top ranked documents are provided to the LLM with the query to generate a response. We perform experiments on our proposed D-RAG framework, and our results show that our Retrieval Blockchain is scalable and our privacy-preserving consensus protocol maintains efficiency as community members increase. These results demonstrate that in a real-world application setting D-RAG is scalable in maintaining data integrity. 
    more » « less
  3. Large Language Models (LLMs) have shown promise in educational applications, but challenges such as hallucinations, lack of contextual relevance, and limited personalization impede their practical adoption. To address these issues, my research introduces MerryQuery, an LLM-powered educational agent that integrates Retrieval-Augmented Generation (RAG), rule-based content control, and Reinforcement Learning from Human Feedback (RLHF). The system features a dynamic learning profile module for adaptive personalization and a multi-step verification framework that cross-checks responses against external sources to enhance trustworthiness. A functional prototype of MerryQuery is being piloted in a real-world classroom. Preliminary results demonstrate improved response reliability and student understanding. 
    more » « less
  4. Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
    Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. 
    more » « less
  5. The increasing popularity of outdoor recreational activities (such as hiking and biking) has boosted the demand for a conversational AI system to provide informative and personalized suggestion on outdoor trails. Challenges arise in response to (1) how to provide accurate outdoor trail information via conversational AI; and (2) how to enable usable and efficient recommendation services. To address above, this paper discusses the preliminary and practical lessons learned from developing Judy, an outdoor trail recommendation chatbot based on the large language model (LLM) with retrieval augmented generation (RAG). To gain concrete system insights, we have performed case studies with the outdoor trails in Connecticut (CT), US. We have conducted web-based data collection, outdoor trail data management, and LLM model performance studies on the RAG-based recommendation. Our experimental results have demonstrated the accuracy, effectiveness, and usability of Judy in recommending outdoor trails based on the LLM with RAG. 
    more » « less