Equitable Top-k Results for Long Tail Data

Islam, Md Mouinul; Asadi, Mahsa; Basu_Roy, Senjuti

doi:10.1145/3626727

For datasets exhibiting long tail phenomenon, we identify a fairness concern in existing top-k algorithms, that return a fixed set of k results for a given query. This causes a handful of popular records (products, items, etc) getting overexposed and always be returned to the user query, whereas, there exists a long tail of niche records that may be equally desirable (have similar utility). To alleviate this, we propose θ-Equiv-top-k-MMSP inside existing top-k algorithms - instead of returning a fixed top-k set, it generates all (or many) top-k sets that are equivalent in utility and creates a probability distribution over those sets. The end user will be returned one of these sets during the query time proportional to its associated probability, such that, after many draws from many end users, each record will have as equal exposure as possible (governed by uniform selection probability). θ-Equiv-top-k-MMSP is formalized with two sub-problems. (a) θ-Equiv-top-k-Sets to produce a set S of sets, each set has k records, where the sets are equivalent in utility with the top-k set; (b) MaxMinFair to produce a probability distribution over S, that is, PDF(S), such that the records in S have uniform selection probability. We formally study the hardness of θ-Equiv-top-k-MMSP. We present multiple algorithmic results - (a) An exact solution for θ-Equiv-top-k-Sets, and MaxMinFair. (b) We design highly scalable algorithms that solve θ-Equiv-top-k-Sets through a random walk and is backed by probability theory, as well as a greedy solution designed for MaxMinFair. (c) We finally present an adaptive random walk based algorithm that solves θ-Equiv-top-k-Sets and MaxMinFair at the same time. We empirically study how θ-Equiv-top-k-MMSP can alleviate a equitable exposure concerns that group fairness suffers from. We run extensive experiments using 6 datasets and design intuitive baseline algorithms that corroborate our theoretical analysis.

More Like this