COILcr: Efficient semantic matching in contextualized exact match retrieval

Fan, Zhen; Gao, Luyu; Jha, Rohan; Callan, Jamie

Citation Details

Lexical exact match systems that use inverted lists are a fundamental text retrieval architecture. A recent advance in neural IR, COIL, extends this approach with contextualized inverted lists from a deep language model backbone and performs retrieval by comparing contextualized query-document term representation, which is effective but computationally expensive. This paper explores the effectiveness-efficiency tradeoff in COIL-style systems, aiming to reduce the computational complexity of retrieval while preserving term semantics. It proposes COILcr, which explicitly factorizes COIL into intra-context term importance weights and cross-context semantic representations. At indexing time, COILcr further maps term semantic representations to a smaller set of canonical representations. Experiments demonstrate that canonical representations can efficiently preserve term semantics, reducing the storage and computational cost of COIL-based retrieval while maintaining model performance. The paper also discusses and compares multiple heuristics for canonical representation selection and looks into its performance in different retrieval settings. more »

Award ID(s):: 1815528

PAR ID:: 10479605

Author(s) / Creator(s):: Fan, Zhen; Gao, Luyu; Jha, Rohan; Callan, Jamie

Publisher / Repository:: Springer Nature Switzerland

Date Published:: 2023-04-02

Journal Name:: Advances in Information Retrieval – 44th European Conference on IR Research

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this