NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Understanding Synthetic Context Extension via Retrieval Heads

Zhao, Xinyu; Yin, Fangcong; Durrett, Greg (July 2025, Proceedings of the International Conference on Machine Learning)

Free, publicly-accessible full text available July 1, 2026
To CoT or not To CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Sprague, Zayne; Yin, Fangcong; Rodriguez, Juan Diego; Jiang, Dongwei; Wadhwa, Manya; Singhal, Prasann; Zhao, Xinyu; Ye, Xi; Mahowald, Kyle; Durrett, Greg (May 2025, Proceedings of the International Conference on Learning Representations)

Free, publicly-accessible full text available May 1, 2026
LOFIT: Localized Fine-tuning on LLM Representations

Yin, Fangcong; Ye, Xi; Durrett, Greg (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
A Long Way to Go: Investigating Length Correlations in RLHF

Singhal, Prasann; Goyal, Tanya; Xu, Jiacheng; Durrett, Greg (October 2024, Proceedings of the Conference on Language Modeling (COLM))

Full Text Available
D2PO: Discriminator-Guided DPO with Response Evaluation Models

Singhal, Prasann; Lambert, Nathan; Niekum, Scott; Goyal, Tanya; Durrett, Greg (October 2024, Proceedings of the Conference on Language Modeling (COLM))

Full Text Available
Complex Claim Verification with Evidence Retrieved in the Wild

Chen, Jifan; Kim, Grace; Sriram, Aniruddh; Durrett, Greg; Choi, Eunsol (June 2024, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers))

Retrieving evidence to support or refute claims is a core part of automatic fact-checking. Prior work makes simplifying assumptions in retrieval that depart from real-world use cases: either no access to evidence, access to evidence curated by a human fact-checker, or access to evidence published after a claim was made. In this work, we present the first realistic pipeline to check real-world claims by retrieving raw evidence from the web. We restrict our retriever to only search documents available prior to the claim’s making, modeling the realistic scenario of emerging claims. Our pipeline includes five components: claim decomposition, raw document retrieval, fine-grained evidence retrieval, claim-focused summarization, and veracity judgment. We conduct experiments on complex political claims in the ClaimDecomp dataset and show that the aggregated evidence produced by our pipeline improves veracity judgments. Human evaluation finds the evidence summary produced by our system is reliable (it does not hallucinate information) and relevant to answering key questions about a claim, suggesting that it can assist fact-checkers even when it does not reflect a complete evidence set.
more » « less
Full Text Available
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning

Sprague, Zayne; Ye, Xi; Bostrom, Kaj; Chaudhuri, Swarat; Durrett, Greg (May 2024, Proceedings of the International Conference on Learning Representations)

While large language models (LLMs) equipped with techniques like chain-of-thought prompting have demonstrated impressive capabilities, they still fall short in their ability to reason robustly in complex settings. However, evaluating LLM reasoning is challenging because system capabilities continue to grow while benchmark datasets for tasks like logical deduction have remained static. We introduce MuSR, a dataset for evaluating language models on multistep soft reasoning tasks specified in a natural language narrative. This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm, enabling the construction of complex reasoning instances that challenge GPT-4 (e.g., murder mysteries roughly 1000 words in length) and which can be scaled further as more capable LLMs are released. Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning; this makes it simultaneously much more challenging than other synthetically-crafted benchmarks while remaining realistic and tractable for human annotators to solve with high accuracy. We evaluate a range of LLMs and prompting techniques on this dataset and characterize the gaps that remain for techniques like chain-of-thought to perform robust reasoning.
more » « less
Full Text Available
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

https://doi.org/10.18653/v1/2024.emnlp-main.499

Tang, Liyan; Laban, Philippe; Durrett, Greg (January 2024, Proceedings of the Conference on Empirical Methods in Natural Language Processing (published by Association for Computational Linguistics))

Full Text Available
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification

https://doi.org/10.18653/v1/2024.findings-emnlp.215

Gunjal, Anisha; Durrett, Greg (January 2024, Findings of the Association for Computational Linguistics: EMNLP 2024)

Full Text Available
Which questions should I answer? Salience Prediction of Inquisitive Questions

https://doi.org/10.18653/v1/2024.emnlp-main.1114

Wu, Yating; Mangla, Ritika Rajesh; Dimakis, Alex; Durrett, Greg; Li, Junyi Jessy (January 2024, Proceedings of the Conference on Empirical Methods in Natural Language Processing (published by Association for Computational Linguistics))

Full Text Available

« Prev Next »

Search for: All records