NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Reinforcement Learning Framework for Relevance Feedback

https://doi.org/10.1145/3397271.3401099

Montazeralghaem, Ali; Zamani, Hamed; Allan, James (July 2020, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR2020))

Relevance feedback is an effective technique for improving retrieval performance using the feedback documents. Selecting effective feedback terms and weighting them have been always challenging. Several methods based on different assumptions have been so far proposed, however, they do not directly optimize the retrieval performance. Learning an effective relevance feedback model is not trivial since the true feedback distribution is unknown. In this paper, we propose a general reinforcement learning framework for relevance feedback, called RML. Our framework directly optimizes any desired retrieval metric, including precision-oriented, recall-oriented, and even diversity metrics. RML can be easily extended to directly optimize any arbitrary user satisfaction signal. Experiments on standard TREC collections demonstrate the effectiveness of our framework.
more » « less
Full Text Available
SearchIE: A Retrieval Approach for Information Extraction

https://doi.org/10.1145/3341981.3344248

Sarwar, Sheikh Muhammad; Allan, James (September 2019, Proceedings of the International Conference on the Theory of Information Retrieval (ICTIR '19))

We address the problem of entity extraction with a very few examples and address it with an information retrieval approach. Existing extraction approaches consider millions of features extracted from a large number of training data cases. Generally, these data cases are generated by a distant supervision approach with entities in a knowledge base. After that a model is learned and entities are extracted. However, with extremely limited data a ranked list of relevant entities can be helpful to obtain user feedback to get more training data. As Information Retrieval (IR) is a natural choice for ranked list generation, we explore its effectiveness in such a limited data case. To this end, we propose SearchIE, a hybrid of IR and NLP approach that indexes documents represented using handcrafted NLP features. At query time SearchIE samples terms from a Logistic Regression model trained with extremely limited data. We show that SearchIE supersedes state-of-the-art NLP models to find civilians killed by US police officers with even a single civilian name as example.
more » « less
Full Text Available
Sentence Retrieval for Entity List Extraction with a Seed, Context, and Topic

https://doi.org/10.1145/3341981.3344250

Sarwar, Sheikh Muhammad; Foley, John; Yang, Liu; Allan, James (September 2019, Proceedings of the 2019 International Conference on the Theory of Information Retrieval (ICTIR 2019))

We present a variation of the corpus-based entity set expansion and entity list completion task. A user-specified query and a sentence containing one seed entity are the input to the task. The output is a list of sentences that contain other instances of the entity class indicated by the input. We construct a semantic query expansion model that leverages topical context around the seed entity and scores sentences. The proposed model finds 46\% of the target entity class by retrieving 20 sentences on average. It achieves 16\% improvement over BM25 model in terms of recall@20.
more » « less
Full Text Available
Term Discrimination Value for Cross-Language Information Retrieval

https://doi.org/10.1145/3341981.3344252

Montazeralghaem, Ali; Rahimi, Razieh; Allan, James (September 2019, Proceedings of International Conference on the Theory of Information Retrieval Conference (ICTIR 2019))

Term discrimination value is among the three basic heuristics exploited, directly or indirectly, in almost all ranking models for ad-hoc Information Retrieval (IR). Query term discrimination in monolingual IR is usually estimated based on document or collection frequency of terms. In query translation approach for CLIR, discrimination value of a query term needs to be estimated based on document or collection frequencies of its translations, which is more challenging. We show that the existing estimation models do not correctly estimate and adequately reflect the difference between discrimination power of query terms, which hurts the retrieval performance. We then propose a new model to estimate discrimination values of query terms for CLIR and empirically demonstrate its impact in improving the CLIR performance.
more » « less
Full Text Available

Search for: All records