NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization

https://doi.org/10.1145/3731120.3744584

Salemi, Alireza; Zamani, Hamed (July 2025, ACM)

This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents, each with a distinct task, backbone large language model (LLM), and RAG strategy. We introduce an iterative approach where the search engine generates retrieval results for the RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase. This feedback is then used to iteratively optimize the search engine using an expectation-maximization algorithm, with the goal of maximizing each agent's utility function. Additionally, we adapt this to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback to better serve the results for each of them. Experiments on datasets from the Knowledge-Intensive Language Tasks (KILT) benchmark demonstrates that our approach significantly on average outperforms baselines across 18 RAG models. We demonstrate that our method effectively ''personalizes'' the retrieval for each RAG agent based on the collected feedback. Finally, we provide a comprehensive ablation study to explore various aspects of our method.
more » « less
Free, publicly-accessible full text available July 18, 2026
Accelerating Retrieval-Augmented Generation

https://doi.org/10.1145/3669940.3707264

Quinn, Derrick; Nouri, Mohammad; Patel, Neel; Salihu, John; Salemi, Alireza; Lee, Sukhan; Zamani, Hamed; Alian, Mohammad (March 2025, ACM)

An evolving solution to address hallucination and enhance accuracy in large language models (LLMs) is Retrieval-Augmented Generation (RAG), which involves augmenting LLMs with information retrieved from an external knowledge source, such as the web. This paper profiles several RAG execution pipelines and demystifies the complex interplay between their retrieval and generation phases. We demonstrate that while exact retrieval schemes are expensive, they can reduce inference time compared to approximate retrieval variants because an exact retrieval model can send a smaller but more accurate list of documents to the generative model while maintaining the same end-to-end accuracy. This observation motivates the acceleration of the exact nearest neighbor search for RAG. In this work, we design Intelligent Knowledge Store (IKS), a type-2 CXL device that implements a scale-out near-memory acceleration architecture with a novel cache-coherent interface between the host CPU and near-memory accelerators. IKS offers 13.4--27.9× faster exact nearest neighbor search over a 512GB vector database compared with executing the search on Intel Sapphire Rapids CPUs. This higher search performance translates to 1.7--26.3× lower end-to-end inference time for representative RAG applications. IKS is inherently a memory expander; its internal DRAM can be disaggregated and used for other applications running on the server to prevent DRAM -- which is the most expensive component in today's servers -- from being stranded.
more » « less
Free, publicly-accessible full text available March 30, 2026
Online and Offline Evaluation in Search Clarification

https://doi.org/10.1145/3681786

Tavakoli, Leila; Trippas, Johanne R; Zamani, Hamed; Scholer, Falk; Sanderson, Mark (January 2025, ACM Transactions on Information Systems)

The effectiveness of clarification question models in engaging users within search systems is currently constrained, casting doubt on their overall usefulness. To improve the performance of these models, it is crucial to employ assessment approaches that encompass both real-time feedback from users (online evaluation) and the characteristics of clarification questions evaluated through human assessment (offline evaluation). However, the relationship between online and offline evaluations has been debated in information retrieval. This study aims to investigate how this discordance holds in search clarification. We use user engagement as ground truth and employ several offline labels to investigate to what extent the offline ranked lists of clarification resemble the ideal ranked lists based on online user engagement. Contrary to the current understanding that offline evaluations fall short of supporting online evaluations, we indicate that when identifying the most engaging clarification questions from the user’s perspective, online and offline evaluations correspond with each other. We show that the query length does not influence the relationship between online and offline evaluations, and reducing uncertainty in online evaluation strengthens this relationship. We illustrate that an engaging clarification needs to excel from multiple perspectives, and SERP quality and characteristics of the clarification are equally important. We also investigate if human labels can enhance the performance of Large Language Models (LLMs) and Learning-to-Rank (LTR) models in identifying the most engaging clarification questions from the user’s perspective by incorporating offline evaluations as input features. Our results indicate that LTR models do not perform better than individual offline labels. However, GPT, an LLM, emerges as the standout performer, surpassing all LTR models and offline labels.
more » « less
Free, publicly-accessible full text available January 31, 2026
Pre-Trained Models for Search and Recommendation: Introduction to the Special Issue—Part 1

https://doi.org/10.1145/3709134

Wang, Wenjie; Liu, Zheng; Feng, Fuli; Dou, Zhicheng; Ai, Qingyao; Yang, Grace Hui; Lian, Defu; Hou, Lu; Sun, Aixin; Zamani, Hamed; et al (March 2025, ACM Transactions on Information Systems)

Free, publicly-accessible full text available March 31, 2026
Interactions with Generative Information Retrieval Systems

https://doi.org/10.1007/978-3-031-73147-1_3

Aliannejadi, Mohammad; Gwizdka, Jacek; Zamani, Hamed (September 2024, Springer Nature Switzerland)
Shah, Chirag; White, Ryen (Ed.)
Full Text Available
Towards a Search Engine for Machines: Unified Ranking for Multiple Retrieval-Augmented Large Language Models

https://doi.org/10.1145/3626772.3657733

Salemi, Alireza; Zamani, Hamed (July 2024, Proceedings of The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024))

Full Text Available
ProCIS: A Benchmark for Proactive Retrieval in Conversations

https://doi.org/10.1145/3626772.3657869

Samarinas, Chris; Zamani, Hamed (July 2024, Proceedings of The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24))

Full Text Available
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility Maximization

https://doi.org/10.1145/3626772.3657923

Zamani, Hamed; Bendersky, Michael (July 2024, Proceedings of The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24))

Full Text Available
ICXML: An In-Context Learning Framework for Zero-Shot Extreme Multi-Label Classification

Zhu, Yaxin; Zamani, Hamed (June 2024, Findings of the Association for Computational Linguistics: Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024))

This paper focuses on the task of Extreme Multi-Label Classification (XMC) whose goal is to predict multiple labels for each instance from an extremely large label space. While existing research has primarily focused on fully supervised XMC, real-world scenarios often lack supervision signals, highlighting the im- portance of zero-shot settings. Given the large label space, utilizing in-context learning approaches is not trivial. We address this issue by introducing In-Context Extreme Multi-label Learning (ICXML), a two-stage framework that cuts down the search space by generating a set of candidate labels through in-context learning and then reranks them. Extensive experiments suggest that ICXML advances the state of the art on two diverse public benchmarks.
more » « less
Full Text Available
Third Workshop on Personalization and Recommendations in Search (PaRiS)

https://doi.org/10.1145/3626772.3657983

Lamkhede, Sudarshan; Zamani, Hamed; Bhattacharya, Moumita; Wang, Hongning (July 2024, Proceedings of The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24))

Full Text Available

« Prev Next »

Search for: All records