skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Search Result Diversification Using Query Aspects as Bottlenecks
We address some of the limitations of coverage-based search result diversification models, which often consist of separate components and rely on external systems for query aspects. To overcome these challenges, we introduce an end-to-end learning framework called DUB. Our approach preserves the intrinsic interpretability of coverage-based methods while enhancing diversification performance. Drawing inspiration from the information bottleneck method, we propose an aspect extractor that generates query aspect embeddings optimized as information bottlenecks for the task of diversified document re-ranking. Experimental results demonstrate that DUB outperforms state-of-the-art diversification models.  more » « less
Award ID(s):
2106282
PAR ID:
10542772
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400701245
Page Range / eLocation ID:
3040 to 3051
Format(s):
Medium: X
Location:
Birmingham United Kingdom
Sponsoring Org:
National Science Foundation
More Like this
  1. EDBT (Ed.)
    Unionable table search techniques input a query table from a user and search for data lake tables that can contribute additional rows to the query table. The definition of unionability is gener- ally based on similarity measures which may include similarity between columns (e.g., value overlap or semantic similarity of the values in the columns) or tables (e.g., similarity of table embed- dings). Due to this and the large redundancy in many data lakes (which can contain many copies and versions of the same table), the most unionable tables may be identical or nearly identical to the query table and may contain little new information. Hence, we introduce the problem of identifying unionable tuples from a data lake that are diverse with respect to the tuples already present in a query table. We perform an extensive experimen- tal analysis of well-known diversity algorithms applied to this novel problem and identify a gap that we address with a novel, clustering-based tuple diversity algorithm called DUST. DUST uses a novel embedding model to represent unionable tuples that outperforms other tuple representation models by at least 15% when representing unionable tuples. Using real data lake bench- marks, we show that our diversification algorithm is more than six times faster than the most efficient diversification baseline. We also show that it is more effective in diversifying unionable tuples than existing diversification algorithms. 
    more » « less
  2. null; null; null (Ed.)
    Using entity aspect links, we improve upon the current state-of-the-art in entity retrieval. Entity retrieval is the task of retrieving relevant entities for search queries, such as "Antibiotic Use In Livestock". Entity aspect linking is a new technique to refine the semantic information of entity links. For example, while passages relevant to the query above may mention the entity "USA", there are many aspects of the USA of which only few, such as "USA/Agriculture", are relevant for this query. By using entity aspect links that indicate which aspect of an entity is being referred to in the context of the query, we obtain more specific relevance indicators for entities. We show that our approach improves upon all baseline methods, including the current state-of-the-art using a standard entity retrieval test collection. With this work, we release a large collection of entity-aspect-links for a large TREC corpus. 
    more » « less
  3. Abstract. Many geoportals such as ArcGIS Online are established with the goal of improving geospatial data reusability and achieving intelligent knowledge discovery. However, according to previous research, most of the existing geoportals adopt Lucene-based techniques to achieve their core search functionality, which has a limited ability to capture the user’s search intentions. To better understand a user’s search intention, query expansion can be used to enrich the user’s query by adding semantically similar terms. In the context of geoportals and geographic information retrieval, we advocate the idea of semantically enriching a user’s query from both geospatial and thematic perspectives. In the geospatial aspect, we propose to enrich a query by using both place partonomy and distance decay. In terms of the thematic aspect, concept expansion and embedding-based document similarity are used to infer the implicit information hidden in a user’s query. This semantic query expansion framework is implemented as a semantically-enriched search engine using ArcGIS Online as a case study. A benchmark dataset is constructed to evaluate the proposed framework. Our evaluation results show that the proposed semantic query expansion framework is very effective in capturing a user’s search intention and significantly outperforms a well-established baseline – Lucene’s practical scoring function – with more than 3.0 increments in DCG@K (K=3,5,10). 
    more » « less
  4. Query by Example is a well-known information retrieval task in which a document is chosen by the user as the search query and the goal is to retrieve relevant documents from a large collection. However, a document often covers multiple aspects of a topic. To address this scenario we introduce the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to the input query document. We focus on the application of this task in scientific literature search. We envision models which are able to retrieve scientific papers analogous to a query scientific paper along specifically chosen rhetorical structure elements as one solution to this problem. In this work, the rhetorical structure elements, which we refer to as facets, indicate objectives, methods, or results of a scientific paper. We introduce and describe an expert annotated test collection to evaluate models trained to perform this task. Our test collection consists of a diverse set of 50 query documents in English, drawn from computational linguistics and machine learning venues. We carefully follow the annotation guideline used by TREC for depth-k pooling (k = 100 or 250) and the resulting data collection consists of graded relevance scores with high annotation agreement. State of the art models evaluated on our dataset show a significant gap to be closed in further work. Our dataset may be accessed here: https://github.com/iesl/CSFCube. 
    more » « less
  5. Ruane, Sara (Ed.)
    Abstract A long-standing hypothesis in evolutionary biology is that the evolution of resource specialization can lead to an evolutionary dead end, where specialists have low diversification rates and limited ability to evolve into generalists. In recent years, advances in comparative methods investigating trait-based differences associated with diversification have enabled more robust tests of this idea and have found mixed support. We test the evolutionary dead end hypothesis by estimating net diversification rate differences associated with nest-type specialization among 3224 species of passerine birds. In particular, we test whether the adoption of hole-nesting, a nest-type specialization that decreases predation, results in reduced diversification rates relative to nesting outside of holes. Further, we examine whether evolutionary transitions to the specialist hole-nesting state have been more frequent than transitions out of hole-nesting. Using diversification models that accounted for background rate heterogeneity and different extinction rate scenarios, we found that hole-nesting specialization was not associated with diversification rate differences. Furthermore, contrary to the assumption that specialists rarely evolve into generalists, we found that transitions out of hole-nesting occur more frequently than transitions into hole-nesting. These results suggest that interspecific competition may limit adoption of hole-nesting, but that such competition does not result in limited diversification of hole-nesters. In conjunction with other recent studies using robust comparative methods, our results add to growing evidence that evolutionary dead ends are not a typical outcome of resource specialization. [Cavity nesting; diversification; hidden-state models; passerines; resource specialization.] 
    more » « less