NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Responsible data management

https://doi.org/10.1145/3488717

Stoyanovich, Julia; Abiteboul, Serge; Howe, Bill; Jagadish, H. V.; Schelter, Sebastian (June 2022, Communications of the ACM)

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.
more » « less
Full Text Available
Disaggregated Interventions to Reduce Inequality

https://doi.org/10.1145/3465416.3483286

Bynum, Lucius; Loftus, Joshua; Stoyanovich, Julia (October 2021, Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’21))

A significant body of research in the data sciences considers unfair discrimination against social categories such as race or gender that could occur or be amplified as a result of algorithmic decisions. Simultaneously, real-world disparities continue to exist, even before algorithmic decisions are made. In this work, we draw on insights from the social sciences brought into the realm of causal modeling and constrained optimization, and develop a novel algorithmic framework for tackling pre-existing real-world disparities. The purpose of our framework, which we call the “impact remediation framework,” is to measure real-world disparities and discover the optimal intervention policies that could help improve equity or access to opportunity for those who are underserved with respect to an outcome of interest. We develop a disaggregated approach to tackling pre-existing disparities that relaxes the typical set of assumptions required for the use of social categories in structural causal models. Our approach flexibly incorporates counterfactuals and is compatible with various ontological assumptions about the nature of social categories. We demonstrate impact remediation with a hypothetical case study and compare our disaggregated approach to an existing state-of-the-art approach, comparing its structure and resulting policy recommendations. In contrast to most work on optimal policy learning, we explore disparity reduction itself as an objective, explicitly focusing the power of algorithms on reducing inequality.
more » « less
Full Text Available
Teaching Responsible Data Science: Charting New Pedagogical Territory

https://doi.org/10.1007/s40593-021-00241-7

Lewis, Armanda; Stoyanovich, Julia (April 2021, International Journal of Artificial Intelligence in Education)
null (Ed.)
Full Text Available
COVID-19 Brings Data Equity Challenges to the Fore

https://doi.org/10.1145/3440889

Jagadish, H. V.; Stoyanovich, Julia; Howe, Bill (March 2021, Digital Government: Research and Practice)
null (Ed.)
The COVID-19 pandemic is compelling us to make crucial data-driven decisions quickly, bringing together diverse and unreliable sources of information without the usual quality control mechanisms we may employ. These decisions are consequential at multiple levels: They can inform local, state, and national government policy, be used to schedule access to physical resources such as elevators and workspaces within an organization, and inform contact tracing and quarantine actions for individuals. In all these cases, significant inequities are likely to arise and to be propagated and reinforced by data-driven decision systems. In this article, we propose a framework, called FIDES, for surfacing and reasoning about data equity in these systems.
more » « less
Full Text Available
MLINSPECT: A Data Distribution Debugger for Machine Learning Pipelines

https://doi.org/10.1145/3448016.3452759

Grafberger, Stefan; Guha, Shubha; Stoyanovich, Julia; Schelter, Sebastian (January 2021, ACM SIGMOD: International Conference on Management of Data)
null (Ed.)
Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policymakers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. While bias detection cannot be fully automated, computational tools can help pinpoint particular types of data issues. We recently proposed mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. In this demonstration, we show how mlinspect can be used to detect data distribution bugs in a representative pipeline. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines, can handle both relational and matrix data, and does not require manual code instrumentation. The library is publicly available at https://github.com/stefan-grafberger/mlinspect.
more » « less
Full Text Available
Causal Intersectionality and Fair Ranking

Yang, Ke; Loftus, Joshua R.; Stoyanovich, Julia (January 2021, 2nd Symposium on Foundations of Responsible Computing (FORC))
null (Ed.)
In this paper we propose a causal modeling approach to intersectional fairness, and a flexible, task-specific method for computing intersectionally fair rankings. Rankings are used in many contexts, ranging from Web search to college admissions, but causal inference for fair rankings has received limited attention. Additionally, the growing literature on causal fairness has directed little attention to intersectionality. By bringing these issues together in a formal causal framework we make the application of intersectionality in algorithmic fairness explicit, connected to important real world effects and domain knowledge, and transparent about technical limitations. We experimentally evaluate our approach on real and synthetic datasets, exploring its behavior under different structural assumptions.
more » « less
Full Text Available
Comparing Apples and Oranges: Fairness and Diversity in Ranking

Stoyanovich, Julia (January 2021, EDBT/ICDT 2021)
null (Ed.)
Full Text Available
Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines

Grafberger, Stefan; Stoyanovich, Julia; Schelter, Sebastian (January 2021, Conference on Innovative Data Systems Research (CIDR))
null (Ed.)
Machine Learning (ML) is increasingly used to automate impactful decisions, and the risks arising from this wide-spread use are garnering attention from policy makers, scientists, and the media. ML applications are often very brittle with respect to their input data, which leads to concerns about their reliability, accountability, and fairness. In this paper we discuss such hard-to-identify data issues and describe mlinspect, a library that enables lightweight lineage-based inspection of ML preprocessing pipelines. The key idea is to extract a directed acyclic graph representation of the data flow from ML preprocessing pipelines in Python, and to use this representation to automatically instrument the code with predefined inspections based on a lightweight annotation propagation approach. In contrast to existing work, mlinspect operates on declarative abstractions of popular data science libraries like estimator/transformer pipelines and does not require manual code instrumentation. We discuss the design and implementation of the mlinspect prototype, and give a complex end-to-end example that illustrates its functionality.
more » « less
Full Text Available
Fairness and Friends

Arif Khan, Falaah; Manis, Eleni; Stoyanovich, Julia (January 2021, Beyond static papers: Rethinking how we share scientific understanding in ML - ICLR 2021 workshop)

Recent interest in codifying fairness in Automated Decision Systems (ADS) has resulted in a wide range of formulations of what it means for an algorithm to be “fair.” Most of these propositions are inspired by, but inadequately grounded in, scholarship from political philosophy. This comic aims to correct that deficit. We begin by setting up a working definition of an 'Automated Decision System' (ADS) and explaining 'bias' in outputs of an ADS. We then critically evaluate different definitions of fairness as Equality of Opportunity (EOP) by contrasting their conception in political philosophy (such as Rawls’s fair EOP and formal EOP) with the proposed codification in Fair-ML (such as statistical parity, equality of odds and accuracy) to provide a clearer lens with which to view existing results and to identify future research directions. We use this framing to reinterpret the impossibility results as the incompatibility between different EOP doctrines and demonstrate how political philosophy can provide normative guidance as to which notion of fairness is applicable in which context. We conclude by highlighting justice considerations that the fair-ML literature currently overlooks or underemphasizes, such as Rawls's broader theory of justice, which supplements his EOP principle with a principle guaranteeing equal rights and liberties to all citizens in a free and democratic society.
more » « less
Full Text Available
Taming Technical Bias in Machine Learning Pipelines

Schelter, Sebastian; Stoyanovich, Julia (December 2020, Bulletin of the Technical Committee on Data Engineering)
Foulds, James; Pan, Shimei (Ed.)
Machine Learning (ML) is commonly used to automate decisions in domains as varied as credit and lending, medical diagnosis, and hiring. These decisions are consequential, imploring us to carefully balance the benefits of efficiency with the potential risks. Much of the conversation about the risks centers around bias — a term that is used by the technical community ever more frequently but that is still poorly understood. In this paper we focus on technical bias — a type of bias that has so far received limited attention and that the data engineering community is well-equipped to address. We discuss dimensions of technical bias that can arise through the ML lifecycle, particularly when it’s due to preprocessing decisions or post-deployment issues. We present results of our recent work, and discuss future research directions. Our over-all goal is to support the development of systems that expose the knobs of responsibility to data scientists, allowing them to detect instances of technical bias and to mitigate it when possible.
more » « less
Full Text Available

« Prev Next »

Search for: All records