skip to main content

Title: Evaluating Stochastic Rankings with Expected Exposure
We introduce the concept of \emph{expected exposure} as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a \emph{distribution over rankings} instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including \emph{ad hoc} retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.
Authors:
; ; ; ;
Award ID(s):
1751278
Publication Date:
NSF-PAR ID:
10199451
Journal Name:
Proceedings of the 29th ACM International Conference on Information and Knowledge Management
Page Range or eLocation-ID:
275 to 284
Sponsoring Org:
National Science Foundation
More Like this
  1. As the content on the Internet continues to grow, many new dynamically changing and heterogeneous sources of data constantly emerge. A conventional search engine cannot crawl and index at the same pace as the expansion of the Internet. Moreover, a large portion of the data on the Internet is not accessible to traditional search engines. Distributed Information Retrieval (DIR) is a viable solution to this as it integrates multiple shards (resources) and provides a unified access to them. Resource selection is a key component of DIR systems. There is a rich body of literature on resource selection approaches for DIR. A key limitation of the existing approaches is that they primarily use term-based statistical features and do not generally model resource-query and resource-resource relationships. In this paper, we propose a graph neural network (GNN) based approach to learning-to-rank that is capable of modeling resource-query and resource-resource relationships. Specifically, we utilize a pre-trained language model (PTLM) to obtain semantic information from queries and resources. Then, we explicitly build a heterogeneous graph to preserve structural information of query-resource relationships and employ GNN to extract structural information. In addition, the heterogeneous graph is enriched with resource-resource type of edges to further enhance themore »ranking accuracy. Extensive experiments on benchmark datasets show that our proposed approach is highly effective in resource selection. Our method outperforms the state-of-the-art by 6.4% to 42% on various performance metrics.« less
  2. Table retrieval is the task of extracting the most relevant tables to answer a user's query. Table retrieval is an important task because many domains have tables that contain useful information in a structured form. Given a user's query, the goal is to obtain a relevance ranking for query-table pairs, such that higher ranked tables should be more relevant to the query. In this paper, we present a context-aware table retrieval method that is based on a novel embedding for attribute tokens. We find that differentiated types of contexts are useful in building word embeddings. We also find that including a specialized representation of numerical cell values in our model improves table retrieval performance. We use the trained model to predict different contexts of every table. We show that the predicted contexts are useful in ranking tables against a query using a multi-field ranking approach. We evaluate our approach using public WikiTables data, and we demonstrate improvements in terms of NDCG over unsupervised baseline methods in the table retrieval task.
  3. Users often fail to formulate their complex information needs in a single query. As a consequence, they need to scan multiple result pages and/or reformulate their queries, which is a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions from the users to clarify their information needs. Asking clarifying questions is especially important in information-seeking conversational systems, since they can only return a limited number (often only one) of results. In this paper, we formulate the task of asking clarifying questions in open-domain information retrieval. We propose an offline evaluation methodology for the task. In this research, we create a dataset, called Qulac, through crowdsourcing. Our dataset is based on the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 100% retrieval performance improvement, which clearly demonstrates the potential impact of the task. We further propose a neural model for selecting clarifying question based on the original query and the previous question-answer interactions. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available.
  4. Ranking evaluation metrics play an important role in information retrieval, providing optimization objectives during development and means of assessment of deployed performance. Recently, fairness of rankings has been recognized as crucial, especially as automated systems are increasingly used for high impact decisions. While numerous fairness metrics have been proposed, a comparative analysis to understand their interrelationships is lacking. Even for fundamental statistical parity metrics which measure group advantage, it remains unclear whether metrics measure the same phenomena, or when one metric may produce different results than another. To address these open questions, we formulate a conceptual framework for analytical comparison of metrics.We prove that under reasonable assumptions, popular metrics in the literature exhibit the same behavior and that optimizing for one optimizes for all. However, our analysis also shows that the metrics vary in the degree of unfairness measured, in particular when one group has a strong majority. Based on this analysis, we design a practical statistical test to identify whether observed data is likely to exhibit predictable group bias. We provide a set of recommendations for practitioners to guide the choice of an appropriate fairness metric.
  5. In mechanism design, the firm has an advantage over its customers in its knowledge of the state of the system, which can affect the utilities of all players. This poses the question: how can the firm utilize that information (and not additional financial incentives) to persuade customers to take actions that lead to higher revenue (or other firm utility)? When the firm is constrained to "cheap talk," and cannot credibly commit to a manner of signaling, the firm cannot change customer behavior in a meaningful way. Instead, we allow firm to commit to how they will signal in advance. Customers can then trust the signals they receive and act on their realization. This thesis contains the work of three papers, each of which applies information design to service systems and online markets. We begin by examining how a firm could signal a queue's length to arriving, impatient customers in a service system. We show that the choice of an optimal signaling mechanism can be written as a infinite linear program and then show an intuitive form for its optimal solution. We show that with the optimal fixed price and optimal signaling, a firm can generate the same revenue as itmore »could with an observable queue and length-dependent variable prices. Next, we study demand and inventory signaling in online markets: customers make strategic purchasing decisions, knowing the price will decrease if an item does not sell out. The firm aims to convince customers to buy now at a higher price. We show that the optimal signaling mechanism is public, and sends all customers the same information. Finally, we consider customers whose ex ante utility is not simply their expected ex post utility, but instead a function of its distribution. We bound the number of signals needed for the firm to generate their optimal utility and provide a convex program reduction of the firm's problem.« less