Context-Aware Document Term Weighting for Ad-Hoc Search

Dai, Zhuyun; Callan, Jamie

doi:10.1145/3366423.3380258

Citation Details

Context-Aware Document Term Weighting for Ad-Hoc Search

Bag-of-words document representations play a fundamental role in modern search engines, but their power is limited by the shallow frequency-based term weighting scheme. This paper proposes HDCT, a context-aware document term weighting framework for document indexing and retrieval. It first estimates the semantic importance of a term in the context of each passage. These fine-grained term weights are then aggregated into a document-level bag-of-words representation, which can be stored into a standard inverted index for efficient retrieval. This paper also proposes two approaches that enable training HDCT without relevance labels. Experiments show that an index using HDCT weights significantly improved the retrieval accuracy compared to typical term-frequency and state-of-the-art embedding-based indexes. more »

Award ID(s):: 1815528

PAR ID:: 10170033

Author(s) / Creator(s):: Dai, Zhuyun; Callan, Jamie

Date Published:: 2020-04-19

Journal Name:: Proceedings of The Web Conference 2020

Page Range / eLocation ID:: 1897 to 1907

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3366423.3380258

More Like this