<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Conference Paper</dc:product_type><dc:title>Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization</dc:title><dc:creator>Yadav, Nishant; Monath, Nicholas; Angell, Rico; Zaheer, Manzil; McCallum, Andrew</dc:creator><dc:corporate_author/><dc:editor/><dc:description>Efficient k-nearest neighbor search is a fundamental task, foundational for many problems in NLP. When the similarity is measured by dot-product between dual-encoder vectors or L2-distance, there already exist many scalable and efficient search methods. But not so when similarity is measured by more accurate and expensive black-box neural similarity models, such as cross-encoders, which jointly encode the query and candidate neighbor. The cross-encoders’ high computational cost typically limits their use to reranking candidates retrieved by a cheaper model, such as dual encoder or TF-IDF. However, the accuracy of such a two-stage approach is upper-bounded by the recall of the initial candidate set, and potentially requires additional training to align the auxiliary retrieval model with the cross-encoder model. In this paper, we present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Retrieval is made efficient with CUR decomposition, a matrix decomposition approach that approximates all pairwise cross-encoder distances from a small subset of rows and columns of the distance matrix. Indexing items using our approach is computationally cheaper than training an auxiliary dual-encoder model through distillation. Empirically, for k &gt; 10, our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods that re-rank items retrieved using a dual-encoder or TF-IDF.</dc:description><dc:publisher>Association for Computational Linguistics</dc:publisher><dc:date>2022-12-01</dc:date><dc:nsf_par_id>10480872</dc:nsf_par_id><dc:journal_name>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</dc:journal_name><dc:journal_volume/><dc:journal_issue/><dc:page_range_or_elocation>2171 to 2194</dc:page_range_or_elocation><dc:issn/><dc:isbn/><dc:doi>https://doi.org/10.18653/v1/2022.emnlp-main.140</dc:doi><dcq:identifierAwardId>1763618</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location>Abu Dhabi, United Arab Emirates</dc:location><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>