skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1617408

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Relevance feedback is an effective technique for improving retrieval performance using the feedback documents. Selecting effective feedback terms and weighting them have been always challenging. Several methods based on different assumptions have been so far proposed, however, they do not directly optimize the retrieval performance. Learning an effective relevance feedback model is not trivial since the true feedback distribution is unknown. In this paper, we propose a general reinforcement learning framework for relevance feedback, called RML. Our framework directly optimizes any desired retrieval metric, including precision-oriented, recall-oriented, and even diversity metrics. RML can be easily extended to directly optimize any arbitrary user satisfaction signal. Experiments on standard TREC collections demonstrate the effectiveness of our framework. 
    more » « less
  2. We address the problem of entity extraction with a very few examples and address it with an information retrieval approach. Existing extraction approaches consider millions of features extracted from a large number of training data cases. Generally, these data cases are generated by a distant supervision approach with entities in a knowledge base. After that a model is learned and entities are extracted. However, with extremely limited data a ranked list of relevant entities can be helpful to obtain user feedback to get more training data. As Information Retrieval (IR) is a natural choice for ranked list generation, we explore its effectiveness in such a limited data case. To this end, we propose SearchIE, a hybrid of IR and NLP approach that indexes documents represented using handcrafted NLP features. At query time SearchIE samples terms from a Logistic Regression model trained with extremely limited data. We show that SearchIE supersedes state-of-the-art NLP models to find civilians killed by US police officers with even a single civilian name as example. 
    more » « less
  3. We present a variation of the corpus-based entity set expansion and entity list completion task. A user-specified query and a sentence containing one seed entity are the input to the task. The output is a list of sentences that contain other instances of the entity class indicated by the input. We construct a semantic query expansion model that leverages topical context around the seed entity and scores sentences. The proposed model finds 46\% of the target entity class by retrieving 20 sentences on average. It achieves 16\% improvement over BM25 model in terms of recall@20. 
    more » « less
  4. Term discrimination value is among the three basic heuristics exploited, directly or indirectly, in almost all ranking models for ad-hoc Information Retrieval (IR). Query term discrimination in monolingual IR is usually estimated based on document or collection frequency of terms. In query translation approach for CLIR, discrimination value of a query term needs to be estimated based on document or collection frequencies of its translations, which is more challenging. We show that the existing estimation models do not correctly estimate and adequately reflect the difference between discrimination power of query terms, which hurts the retrieval performance. We then propose a new model to estimate discrimination values of query terms for CLIR and empirically demonstrate its impact in improving the CLIR performance. 
    more » « less