Search for: All records

Award ID contains: 1916505

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Game Of Recourse: Simulating Algorithmic Recourse over Time to Improve Its Reliability and Fairness

https://doi.org/10.1145/3626246.3654742

Bell, Andrew; Fonseca, Joao; Stoyanovich, Julia (June 2024, SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data)

Algorithmic recourse, or providing recommendations to individuals who receive an unfavorable outcome from an algorithmic system on how they can take action and change that outcome, is an important tool for giving individuals agency against algorithmic decision systems. Unfortunately, research on algorithmic recourse faces a fundamental challenge: there are no publicly available datasets on algorithmic recourse. In this work, we begin to explore a solution to this challenge by creating an agent-based simulation called The Game of Recourse (an homage to Conway's Game of Life) to synthesize realistic algorithmic recourse data. We designed The Game of Recourse with a focus on reliability and fairness, two areas of critical importance in socio-technical systems.
more » « less
Full Text Available
Responsible Model Selection with Virny and VirnyView

https://doi.org/10.1145/3626246.3654738

Herasymuk, Denys; Arif_Khan, Falaah; Stoyanovich, Julia (June 2024, SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of Data)

In this demonstration, we present a comprehensive software library for model auditing and responsible model selection, called Virny, along with an interactive tool called VirnyView. Our library is modular and extensible, it implements a rich set of performance and fairness metrics, including novel metrics that quantify and compare model stability and uncertainty, and enables performance analysis based on multiple sensitive attributes, and their intersections. The Virny library and the VirnyView tool are available at https://github.com/DataResponsibly/Virny and https://r-ai.co/VirnyView.
more » « less
Full Text Available
Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

https://doi.org/10.1145/3665252.3665267

Rosenblatt, Lucas; Herman, Bernease; Holovenko, Anastasia; Lee, Wonkwon; Loftus, Joshua; McKinnie, Elizabeth; Rumezhak, Taras; Stadnik, Andrii; Howe, Bill; Stoyanovich, Julia (May 2024, ACM SIGMOD Record)

Differential privacy (DP) data synthesizers are increasingly proposed to afford public release of sensitive information, offering theoretical guarantees for privacy (and, in some cases, utility), but limited empirical evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, multivariate correlations, the accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real, publicly available data, then re-running these experiments a second time on DP synthetic data and comparing the results.
more » « less
Full Text Available
Making Transparency Influencers: A Case Study of an Educational Approach to Improve Responsible AI Practices in News and Media

https://doi.org/10.1145/3613905.3637113

Bell, Andrew; Stoyanovich, Julia (May 2024, Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems)

Concerns about the risks posed by artificial intelligence (AI) have resulted in growing interest in algorithmic transparency. While algorithmic transparency is well-studied, there is evidence that many organizations do not value implementing transparency. In this case study, we test a ground-up approach to ensuring better real-world algorithmic transparency by creating transparency influencers — motivated individuals within organizations who advocate for transparency. We held an interactive online workshop on algorithmic transparency and advocacy for 15 professionals from news, media, and journalism. We reflect on workshop design choices and presents insights from participant interviews. We found positive evidence for our approach: In the days following the workshop, three participants had done pro-transparency advocacy. Notably, one of them advocated for algorithmic transparency at an organization-wide AI strategy meeting. In the words of a participant: “if you are questioning whether or not you need to tell people [about AI], you need to tell people.”
more » « less
Full Text Available
A Simple and Practical Method for Reducing the Disparate Impact of Differential Privacy

https://doi.org/10.1609/aaai.v38i19.30153

Rosenblatt, Lucas; Stoyanovich, Julia; Musco, Christopher (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Differentially private (DP) mechanisms have been deployed in a variety of high-impact social settings (perhaps most notably by the U.S. Census). Since all DP mechanisms involve adding noise to results of statistical queries, they are expected to impact our ability to accurately analyze and learn from data, in effect trading off privacy with utility. Alarmingly, the impact of DP on utility can vary significantly among different sub-populations. A simple way to reduce this disparity is with stratification. First compute an independent private estimate for each group in the data set (which may be the intersection of several protected classes), then, to compute estimates of global statistics, appropriately recombine these group estimates. Our main observation is that naive stratification often yields high-accuracy estimates of population-level statistics, without the need for additional privacy budget. We support this observation theoretically and empirically. Our theoretical results center on the private mean estimation problem, while our empirical results center on extensive experiments on private data synthesis to demonstrate the effectiveness of stratification on a variety of private mechanisms. Overall, we argue that this straightforward approach provides a strong baseline against which future work on reducing utility disparities of DP mechanisms should be compared.
more » « less
Full Text Available
Automated Data Cleaning Can Hurt Fairness in Machine Learning-based Decision Making

https://doi.org/10.1109/TKDE.2024.3365524

Guha, Shubha; Khan, Falaah Arif; Stoyanovich, Julia; Schelter, Sebastian (January 2024, IEEE Transactions on Knowledge and Data Engineering)

In this paper, we interrogate whether data quality issues track demographic group membership (based on sex, race and age) and whether automated data cleaning — of the kind commonly used in production ML systems — impacts the fairness of predictions made by these systems. To the best of our knowledge, the impact of data cleaning on fairness in downstream tasks has not been investigated in the literature. We first analyse the tuples flagged by common error detection strategies in five research datasets. We find that, while specific data quality issues, such as higher rates of missing values, are associated with membership in historically disadvantaged groups, poor data quality does not generally track demographic group membership. As a follow-up, we conduct a large-scale empirical study on the impact of automated data cleaning on fairness, involving more than 26,000 model evaluations. We observe that, while automated data cleaning is unlikely to worsen accuracy, it is more likely to worsen fairness than to improve it, especially when the cleaning techniques are not carefully chosen. Furthermore, we find that the positive or negative impact of a particular cleaning technique often depends on the choice of fairness metric and group definition (single-attribute or intersectional). We make our code and experimental results publicly available. The analysis we conducted in this paper is difficult, primarily because it requires that we think holistically about disparities in data quality, disparities in the effectiveness of data cleaning methods, and impacts of such disparities on ML model performance for different demographic groups. Such holistic analysis can and should be supported by data engineering tools, and requires substantial data engineering research. Towards this goal, we discuss open research questions, envision the development of fairness-aware data cleaning methods, and their integration into complex pipelines for ML-based decision making.
more » « less
Full Text Available
Temporal graph patterns by timed automata

https://doi.org/10.1007/s00778-023-00795-z

Aghasadeghi, Amir; Van_den_Bussche, Jan; Stoyanovich, Julia (January 2024, The VLDB Journal)

Temporal graphs represent graph evolution over time, and have been receiving considerable research attention. Work on expressing temporal graph patterns or discovering temporal motifs typically assumes relatively simple temporal constraints, such as journeys or, more generally, existential constraints, possibly with finite delays. In this paper we propose to use timed automata to express temporal constraints, leading to a general and powerful notion of temporal basic graph pattern (BGP). The new difficulty is the evaluation of the temporal constraint on a large set of matchings. An important benefit of timed automata is that they support an iterative state assignment, which can be useful for early detection of matches and pruning of non-matches. We introduce algorithms to retrieve all instances of a temporal BGP match in a graph, and present results of an extensive experimental evaluation, demonstrating interesting performance trade-offs. We show that an on-demand algorithm that processes total matchings incrementally over time is preferable when dealing with cyclic patterns on sparse graphs. On acyclic patterns or dense graphs, and when connectivity of partial matchings can be guaranteed, the best performance is achieved by maintaining partial matchings over time and allowing automaton evaluation to be fully incremental. The code and datasets used in our analysis are available athttp://github.com/amirpouya/TABGP.
more » « less
Full Text Available
Setting the Right Expectations: Algorithmic Recourse Over Time

https://doi.org/10.1145/3617694.3623251

Fonseca, João; Bell, Andrew; Abrate, Carlo; Bonchi, Francesco; Stoyanovich, Julia (October 2023, EAAMO '23: Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization)

Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single individual, overlooking a critical element: the effects of a continuously changing context. Disregarding these effects on recourse is a significant oversight, since, in almost all cases, recourse consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date — when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals. In this work we propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse. In particular, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Consequently, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents’ effort.
more » « less
Full Text Available
Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

https://doi.org/10.14778/3611479.3611517

Rosenblatt, Lucas; Herman, Bernease; Holovenko, Anastasia; Lee, Wonkwon; Loftus, Joshua; McKinnie, Elizabeth; Rumezhak, Taras; Stadnik, Andrii; Howe, Bill; Stoyanovich, Julia (July 2023, Proceedings of the VLDB Endowment)

Differential privacy (DP) data synthesizers are increasingly proposed to afford public release of sensitive information, offering theoretical guarantees for privacy (and, in some cases, utility), but limited empirical evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, multivariate correlations, the accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real, publicly available data, then re-running these experiments a second time on DP synthetic data and comparing the results. We instantiate our methodology over a benchmark of recent peer-reviewed papers that analyze public datasets in the ICPSR social science repository. We model quantitative claims computationally to automate the experimental workflow, and model qualitative claims by reproducing visualizations and comparing the results manually. We then generate DP synthetic datasets using multiple state-of-the-art mechanisms, and estimate the likelihood that these conclusions will hold. We find that, for reasonable privacy regimes, state-of-the-art DP synthesizers are able to achieve high epistemic parity for several papers in our benchmark. However, some papers, and particularly some specific findings, are difficult to reproduce for any of the synthesizers. Given these results, we advocate for a new class of mechanisms that can reorder the priorities for DP data synthesis: favor stronger guarantees for utility (as measured by epistemic parity) and offer privacy protection with a focus on application-specific threat models and risk-assessment.
more » « less
Full Text Available
Responsible AI literacy: A stakeholder-first approach

https://doi.org/10.1177/20539517231219958

Domínguez_Figaredo, Daniel; Stoyanovich, Julia (July 2023, Big Data & Society)

The need for citizens to better understand the ethical and social challenges of algorithmic systems has led to a rapid proliferation of AI literacy initiatives. After reviewing the literature on AI literacy projects, we found that most educational practices in this area are based on teaching programming fundamentals, primarily to K-12 students. This leaves out citizens and those who are primarily interested in understanding the implications of automated decision- making systems, rather than in learning to code. To address these gaps, this article explores the methodological contributions of responsible AI education practices that focus first on stakeholders when designing learning experiences for different audiences and contexts. The article examines the weaknesses identified in current AI literacy projects, explains the stakeholder-first approach, and analyzes several responsible AI education case studies, to illustrate how such an approach can help overcome the aforementioned limitations. The results suggest that the stakeholder-first approach allows to address audiences beyond the usual ones in the field of AI literacy, and to incorporate new content and methodologies depending on the needs of the respective audiences, thus opening new avenues for teaching and research in the field.
more » « less
Full Text Available

« Prev Next »