NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Query Refinement for Diverse Top-k Selection

https://doi.org/10.1145/3654969

Campbell, Felix S; Silberstein, Alon; Stoyanovich, Julia; Moskovitch, Yuval (May 2024, Proceedings of the ACM on Management of Data)

Database queries are often used to select and rank items as decision support for many applications. As automated decision-making tools become more prevalent, there is a growing recognition of the need to diversify their outcomes. In this paper, we define and study the problem of modifying the selection conditions of an ORDER BY query so that the result of the modified query closely fits some user-defined notion of diversity while simultaneously maintaining the intent of the original query. We show the hardness of this problem and propose a mixed-integer linear programming (MILP) based solution. We further present optimizations designed to enhance the scalability and applicability of the solution in real-life scenarios. We investigate the performance characteristics of our algorithm and show its efficiency and the usefulness of our optimizations.
more » « less
Full Text Available
Query Refinement for Diversity Constraint Satisfaction

https://doi.org/10.14778/3626292.3626295

Li, Jinyang; Moskovitch, Yuval; Stoyanovich, Julia; Jagadish, H. V. (October 2023, Proceedings of the VLDB Endowment)

Diversity, group representation, and similar needs often apply to query results, which in turn require constraints on the sizes of various subgroups in the result set. Traditional relational queries only specify conditions as part of the query predicate(s), and do not support such restrictions on the output. In this paper, we study the problem of modifying queries to have the result satisfy constraints on the sizes of multiple subgroups in it. This problem, in the worst case, cannot be solved in polynomial time. Yet, with the help of provenance annotation, we are able to develop a query refinement method that works quite efficiently, as we demonstrate through extensive experiments.
more » « less
Full Text Available
Dexer: Detecting and Explaining Biased Representation in Ranking

https://doi.org/10.1145/3555041.3589725

Moskovitch, Yuval; Li, Jinyang; Jagadish, H. V. (June 2023, ACM)

Full Text Available
Erica: Query Refinement for Diversity Constraint Satisfaction

https://doi.org/10.14778/3611540.3611623

Li, Jinyang; Silberstein, Alon; Moskovitch, Yuval; Stoyanovich, Julia; Jagadish, H. V. (August 2023, Proceedings of the VLDB Endowment)

Relational queries are commonly used to support decision making in critical domains like hiring and college admissions. For example, a college admissions officer may need to select a subset of the applicants for in-person interviews, who individually meet the qualification requirements (e.g., have a sufficiently high GPA) and are collectively demographically diverse (e.g., include a sufficient number of candidates of each gender and of each race). However, traditional relational queries only support selection conditions checked against each input tuple, and they do not support diversity conditions checked against multiple, possibly overlapping, groups of output tuples. To address this shortcoming, we present Erica, an interactive system that proposes minimal modifications for selection queries to have them satisfy constraints on the cardinalities of multiple groups in the result. We demonstrate the effectiveness of Erica using several real-life datasets and diversity requirements.
more » « less
Full Text Available
Detection of Groups with Biased Representation in Ranking

https://doi.org/10.1109/ICDE55515.2023.00168

Li, Jinyang; Moskovitch, Yuval; Jagadish, H. V. (April 2023, IEEE)

Full Text Available
Reliability at multiple stages in a data analysis pipeline

https://doi.org/10.1145/3500923

Moskovitch, Yuval; Jagadish, H. V. (November 2022, Communications of the ACM)

Data-centric methods designed to increase end-to-end reliability of data-driven decision systems.
more » « less
Full Text Available
Bias analysis and mitigation in data-driven tools using provenance

https://doi.org/10.1145/3530800.3534528

Moskovitch, Yuval; Li, Jinyang; Jagadish, H. V. (June 2022, TaPP '22: Proceedings of the 14th International Workshop on the Theory and Practice of Provenance)

Full Text Available
DENOUNCER: detection of unfairness in classifiers

https://doi.org/10.14778/3476311.3476328

Li, Jinyang; Moskovitch, Yuval; Jagadish, H. V. (July 2021, Proceedings of the VLDB Endowment)

The use of automated data-driven tools for decision-making has gained popularity in recent years. At the same time, the reported cases of algorithmic bias and discrimination increase as well, which in turn lead to an extensive study of algorithmic fairness. Numerous notions of fairness have been proposed, designed to capture different scenarios. These measures typically refer to a "protected group" in the data, defined using values of some sensitive attributes. Confirming whether a fairness definition holds for a given group is a simple task, but detecting groups that are treated unfairly by the algorithm may be computationally prohibitive as the number of possible groups is combinatorial. We present a method for detecting such groups efficiently for various fairness definitions. Our solution is implemented in a system called DENOUNCER, an interactive system that allows users to explore different fairness measures of a (trained) classifier for a given test data. We propose to demonstrate the usefulness of DENOUNCER using real-life data and illustrate the effectiveness of our method.
more » « less
Full Text Available
Patterns Count-Based Labels for Datasets

https://doi.org/10.1109/ICDE51399.2021.00184

Moskovitch, Yuval; Jagadish, H. V. (April 2021, 37th IEEE International Conference on Data Engineering, {ICDE})
null (Ed.)
Full Text Available
COUNTATA: Dataset Labeling Using Pattern Counts

Moskovitch, Yuval; Jagadish, H. V. (January 2020, Proceedings of the VLDB Endowment)

Full Text Available

Search for: All records