Faster Algorithms for Fair Max-Min Diversification in R d

Kurkure, Yash; Shamo, Miles; Wiseman, Joseph; Galhotra, Sainyam; Sintos, Stavros

doi:10.1145/3654940

Citation Details

Faster Algorithms for Fair Max-Min Diversification in R d

The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size k from a large collection of n data points (FairDiv). The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to n or k) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution. more »

Award ID(s):: 2348919

PAR ID:: 10614544

Author(s) / Creator(s):: Kurkure, Yash; Shamo, Miles; Wiseman, Joseph; Galhotra, Sainyam; Sintos, Stavros

Publisher / Repository:: Association for Computing Machinery (ACM)

Date Published:: 2024-05-30

Journal Name:: Proceedings of the ACM on Management of Data

Volume:: 2

Issue:: 3

ISSN:: 2836-6573

Page Range / eLocation ID:: 1 to 26

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3654940

More Like this