Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR.Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embeddings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.
more »
« less
Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation
For most queries, the set of relevant documents spans multiple
subtopics. Inspired by the neural ranking models and query-specific
neural clustering models, we develop Topic-Mono-BERT which
performs both tasks jointly. Based on text embeddings of BERT, our
model learns a shared embedding that is optimized for both tasks.
The clustering hypothesis would suggest that embeddings which
place topically similar text in close proximity will also perform
better on ranking tasks. Our model is trained with the Wikimarks
approach to obtain training signals for relevance and subtopics on
the same queries.
Our task is to identify overview passages that can be used to
construct a succinct answer to the query. Our empirical evaluation
on two publicly available passage retrieval datasets suggests that
including the clustering supervision in the ranking model leads to
about 16% improvement in identifying text passages that summarize
different subtopics within a query.
more »
« less
- Award ID(s):
- 1846017
- NSF-PAR ID:
- 10473540
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400700231
- Format(s):
- Medium: X
- Location:
- Kolkata India
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations. Neural IR models have achieved promising results in learning query-document relevance patterns, but few explorations have been done on understanding the text content of a query or a document. This paper studies leveraging a recently-proposed contextual neural language model, BERT, to provide deeper text understanding for IR.Experimental results demonstrate that the contextual text representations from BERT are more effective than traditional word embed-dings. Compared to bag-of-words retrieval models, the contextual language model can better leverage language structures, bringing large improvements on queries written in natural languages. Combining the text understanding ability with search knowledge leads to an enhanced pre-trained BERT model that can benefit related search tasks where training data are limited.more » « less
-
Knowledge Graph embeddings model semantic and struc- tural knowledge of entities in the context of the Knowledge Graph. A nascent research direction has been to study the utilization of such graph embeddings for the IR-centric task of entity ranking. In this work, we replicate the GEEER study of Gerritse et al. [9] which demonstrated improvements of Wiki2Vec embeddings on entity ranking tasks on the DBpediaV2 dataset. We further extend the study by exploring additional state-of-the-art entity embeddings ERNIE [27] and E-BERT [19], and by including another test collection, TREC CAR, with queries not about person, location, and organization entities. We confirm the finding that entity embeddings are beneficial for the entity ranking task. Interestingly, we find that Wiki2Vec is competitive with ERNIE and E-BERT. Our code and data to aid reproducibility and further research is available at https://github.com/poojahoza/E3R-Replicabilitymore » « less
-
A prevalent approach of entity-oriented systems involves retrieving relevant entities by harnessing knowledge graph embeddings. These embeddings encode entity information in the context of the knowledge graph and are static in nature. Our goal is to generate entity embeddings that capture what renders them relevant for the query. This differs from entity embeddings constructed with static resource, for example, E-BERT. Previously, ~\citet{dalton2014entity} demonstrated the benefits obtained with the Entity Context Model, a pseudo-relevance feedback approach based on entity links in relevant contexts. In this work, we reinvent the Entity Context Model (ECM) for neural graph networks and incorporate pre-trained embeddings. We introduce three entity ranking models based on fundamental principles of ECM: (1) \acl{GAN}, (2) Simple Graph Relevance Networks, and (3) Graph Relevance Networks. \acl{GAN} and Graph Relevance Networks are the graph neural variants of ECM, that employ attention mechanism and relevance information of the relevant context respectively to ascertain entity relevance. Our experiments demonstrate that our neural variants of the ECM model significantly outperform the state-of-the-art BERT-ER ~\cite{10.1145/3477495.3531944} by more than 14\% and exceeds the performance of systems that use knowledge graph embeddings by over 101\%. Notably, our findings reveal that leveraging the relevance of the relevant context is more effective at identifying relevant entities than the attention mechanism. To evaluate the efficacy of the models, we conduct experiments on two standard benchmark datasets, DBpediaV2 and TREC Complex Answer Retrieval. To aid reproducibility, our code and data are available. https://github.com/TREMA-UNH/neural-entity-context-modelsmore » « less
-
We present a context-aware neural ranking model to exploit users' on-task search activities and enhance retrieval performance. In particular, a two-level hierarchical recurrent neural network is introduced to learn search context representation of individual queries, search tasks, and corresponding dependency structure by jointly optimizing two companion retrieval tasks: document ranking and query suggestion. To identify variable dependency structure between search context and users' ongoing search activities, attention at both levels of recurrent states are introduced. Extensive experiment comparisons against a rich set of baseline methods and an in-depth ablation analysis confirm the value of our proposed approach for modeling search context buried in search tasks.more » « less