NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Understanding Synthetic Context Extension via Retrieval Heads

Zhao, Xinyu; Yin, Fangcong; Durrett, Greg (July 2025, Proceedings of the International Conference on Machine Learning)

Full Text Available
Understanding Synthetic Context Extension via Retrieval Heads

Zhao, Xinyu; Yin, Fangcong; Durrett, Greg (May 2025, ICML 2025)

Long-context LLMs are increasingly in demand for applications such as retrieval-augmented generation. To defray the cost of pretraining LLMs over long contexts, recent work takes an approach of synthetic context extension: fine-tuning LLMs with synthetically generated long-context data in a post-training stage. However, it remains unclear how and why this synthetic context extension imparts abilities for downstream long-context tasks. In this paper, we investigate fine-tuning on synthetic data for three long-context tasks that require retrieval and reasoning. We vary the realism of "needle" concepts to be retrieved and diversity of the surrounding "haystack" context, from using LLMs to construct synthetic documents to using templated relations and creating symbolic datasets. We find that models trained on synthetic data fall short of the real data, but surprisingly, the mismatch can be interpreted and even predicted in terms of a special set of attention heads that are responsible for retrieval over long context, retrieval heads (Wu et al., 2024). The retrieval heads learned on synthetic data have high overlap with retrieval heads learned on real data, and there is a strong correlation between the recall of heads learned and the downstream performance of a model. Furthermore, with attention knockout and activation patching, we mechanistically show that retrieval heads are necessary and explain model performance, although they are not totally sufficient. Our results shed light on how to interpret synthetic data fine-tuning performance and how to approach creating better data for learning real-world capabilities over long contexts.
more » « less
Full Text Available
LOFIT: Localized Fine-tuning on LLM Representations

Yin, Fangcong; Ye, Xi; Durrett, Greg (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
LoFiT: Localized Fine-tuning on LLM Representations

Yin, Fangcong; Ye, Xi; Durrett, Greg (October 2024, https://doi.org/10.48550/arXiv.2406.01563)

Recent work in interpretability shows that large language models (LLMs) can be adapted for new tasks in a learning-free way: it is possible to intervene on LLM representations to elicit desired behaviors for alignment. For instance, adding certain bias vectors to the outputs of certain attention heads is reported to boost the truthfulness of models. In this work, we show that localized fine-tuning serves as an effective alternative to such representation intervention methods. We introduce a framework called Localized Fine-Tuning on LLM Representations (LoFiT), which identifies a subset of attention heads that are most important for learning a specific task, then trains offset vectors to add to the model's hidden representations at those selected heads. LoFiT localizes to a sparse set of heads (3%-10%) and learns the offset vectors from limited training data, comparable to the settings used for representation intervention. For truthfulness and reasoning tasks, we find that LoFiT's intervention vectors are more effective for LLM adaptation than vectors from representation intervention methods such as Inference-time Intervention. We also find that the localization step is important: selecting a task-specific set of attention heads can lead to higher performance than intervening on heads selected for a different task. Finally, across 7 tasks we study, LoFiT achieves comparable performance to other parameter-efficient fine-tuning methods such as LoRA, despite modifying 20x-200x fewer parameters than these methods.
more » « less
Full Text Available
To CoT or not To CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Sprague, Zayne; Yin, Fangcong; Rodriguez, Juan Diego; Jiang, Dongwei; Wadhwa, Manya; Singhal, Prasann; Zhao, Xinyu; Ye, Xi; Mahowald, Kyle; Durrett, Greg (May 2025, Proceedings of the International Conference on Learning Representations)

Full Text Available
TO COT OR NOT TO COT? CHAIN-OF-THOUGHT HELPS MAINLY ON MATH AND SYMBOLIC REASONING

Sprague, Zayne Rea; Yin, Fangcong; Rodriguez, Juan Diego; Jiang, Dongwei; Wadhwa, Manya; Singhal, Prasann; Zhao, Xinyu; Ye, Xi; Mahowald, Kyle; Durrett, Greg (January 2025, ICLR 2025)

Full Text Available

Search for: All records