skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Rethink training of BERT rerankers in multi-stage retrieval pipeline
Pre-trained deep language models (LM) have advanced the state-of-the-art of text retrieval. Rerankers fine-tuned from deep LM estimates candidate relevance based on rich contextualized matching signals. Meanwhile, deep LMs can also be leveraged to improve search index, building retrievers with better recall. One would expect a straightforward combination of both in a pipeline to have additive performance gain. In this paper, we discover otherwise and that popular reranker cannot fully exploit the improved retrieval result. We, therefore, propose a Localized Contrastive Estimation (LCE) for training rerankers and demonstrate it significantly improves deep two-stage models (Our codes are open sourced at https://github.com/luyug/Reranker.).  more » « less
Award ID(s):
1815528
PAR ID:
10273588
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Advances in Information Retrieval – 43rd European Conference on IR Research
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Moens, Marie-Francine; Huang, Xuanjing; Specia, Lucia; Yih, Scott Wen-tau (Ed.)
    Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient text comparison and retrieval. However, dense encoders require a lot of data and sophisticated techniques to effectively train and suffer in low data situations. This paper finds a key reason is that standard LMs’ internal attention structure is not ready-to-use for dense encoders, which needs to aggregate text information into the dense representation. We propose to pre-train towards dense encoder with a novel Transformer architecture, Condenser, where LM prediction CONditions on DENSE Representation. Our experiments show Condenser improves over standard LM by large margins on various text retrieval and similarity tasks. 
    more » « less
  2. null (Ed.)
    Classical information retrieval systems such asBM25 rely on exact lexical match and carryout search efficiently with inverted list index. Recent neural IR models shifts towards soft semantic matching all query document terms,but they lose the computation efficiency of exact match systems.This paper presents COIL, a contextualized exact match retrieval architecture that brings semantic lexical matching. COIL scoring is based on overlapping query document tokens’ contextualized representations. The new architecture stores contextualized token representations in inverted lists, bringing together the efficiency of exact match and the representation power of deep language models. Our experimental results show COIL outperforms classical lexical retrievers and state-of-the-art deep LM retrievers with similar or smaller latency. 
    more » « less
  3. Abstract Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos. 
    more » « less
  4. Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered fine-tuning pipelines to realize their full potential. In this paper, we identify and address two underlying problems of dense retrievers: i) fragility to training data noise and ii) requiring large batches to robustly learn the embedding space. We use the recently proposed Condenser pre-training architecture, which learns to condense information into the dense vector through LM pre-training. On top of it, we propose coCondenser, which adds an unsupervised corpus-level contrastive loss to warm up the passage embedding space. Experiments on MS-MARCO, Natural Question, and Trivia QA datasets show that coCondenser removes the need for heavy data engineering such as augmentation, synthesis, or filtering, and the need for large batch training. It shows comparable performance to RocketQA, a state-of-the-art, heavily engineered system, using simple small batch fine-tuning. 
    more » « less
  5. Due to the limited availability of actual large-scale datasets, realistic synthetic trajectory data play a crucial role in various research domains, including spatiotemporal data mining and data management, and domain-driven research related to transportation planning and urban analytics. Existing generation methods rely on predefined heuristics and cannot learn the unknown underlying generative mechanisms. This work introduces two end-to-end approaches for trajectory generation. The first approach comprises deep generative VAE-like models that factorize global and local semantics (habits vs. random routing change). We further enhance this approach by developing novel inference strategies based on variational inference and constrained optimization to ensure the validity of spatiotemporal aspects. This novel deep neural network architecture implements generative and inference models with dynamic latent priors. The second approach introduces a language model (LM) inspired generation as another benchmarking and foundational approach. The LM-inspired approach conceptualizes trajectories as sentences with the aim of predicting the likelihood of subsequent locations on a trajectory, given the locations as context. As a result, the LM-inspired approach implicitly learns the inherent spatiotemporal structure and other embedded semantics within the trajectories. These proposed methods demonstrate substantial quantitative and qualitative improvements over existing approaches, as evidenced by extensive experimental evaluations. 
    more » « less