skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
We present a large, challenging dataset, COUGH, for COVID-19 FAQ retrieval. Similar to a standard FAQ dataset, COUGH consists of three parts: FAQ Bank, Query Bank and Relevance Set. The FAQ Bank contains ~16K FAQ items scraped from 55 credible websites (e.g., CDC and WHO). For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1,236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query. We analyze COUGH by testing different FAQ retrieval models built on top of BM25 and BERT, among which the best model achieves 48.8 under P@5, indicating a great challenge presented by COUGH and encouraging future research for further improvement. Our COUGH dataset is available at https://github.com/sunlab-osu/covid-faq.  more » « less
Award ID(s):
1942980 1815674
PAR ID:
10334272
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods. 
    more » « less
  2. Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34k manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models. 
    more » « less
  3. A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text. We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well. 
    more » « less
  4. null (Ed.)
    We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a \emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including \emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress. 
    more » « less
  5. Table search aims to answer a query with a ranked list of tables. Unfortunately, current test corpora have focused mostly on needle- in-the-haystack tasks, where only a few tables are expected to exactly match the query intent. Instead, table search tasks often arise in response to the need for retrieving new datasets or augment- ing existing ones, e.g., for data augmentation within data science or machine learning pipelines. Existing table repositories and bench- marks are limited in their ability to test retrieval methods for table search tasks. Thus, to close this gap, we introduce a novel dataset for query-by-example Semantic Table Search. This novel dataset con- sists of two snapshots of the large-scale Wikipedia tables collection from 2013 and 2019 with two important additions: (1) a page and topic aware ground truth relevance judgment and (2) a large-scale DBpedia entity linking annotation. Moreover, we generate a novel set of entity-centric queries that allows testing existing methods under a novel search scenario: semantic exploratory search. The resulting resource consists of 9,296 novel queries, 610,553 query- table relevance annotations, and 238,038 entity-linked tables from the 2013 snapshot. Similarly, on the 2019 snapshot, the resource consists of 2,560 queries, 958,214 relevance annotations, and 457,714 total tables. This makes our resource the largest annotated table- search corpus to date (97 times more queries and 956 times more annotated tables than any existing benchmark). We perform a user study among domain experts and prove that these annotators agree with the automatically generated relevance annotations. As a re- sult, we can re-evaluate some basic assumptions behind existing table search approaches identifying their shortcomings along with promising novel research directions. 
    more » « less