skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Algorithmic Impact Assessments and Accountability: The Co-construction of Impacts
Algorithmic impact assessments (AIAs) are an emergent form of accountability for entities that build and deploy automated decision-support systems. These are modeled after impact assessments in other domains. Our study of the history of impact assessments shows that "impacts" are an evaluative construct that enable institutions to identify and ameliorate harms experienced because of a policy decision or system. Every domain has different expectations and norms about what constitutes impacts and harms, how potential harms are rendered as the impacts of a particular undertaking, who is responsible for conducting that assessment, and who has the authority to act on the impact assessment to demand changes to that undertaking. By examining proposals for AIAs in relation to other domains, we find that there is a distinct risk of constructing algorithmic impacts as organizationally understandable metrics that are nonetheless inappropriately distant from the harms experienced by people, and which fall short of building the relationships required for effective accountability. To address this challenge of algorithmic accountability, and as impact assessments become a commonplace process for evaluating harms, the FAccT community should A) understand impacts as objects constructed for evaluative purposes, B) attempt to construct impacts as close as possible to actual harms, and C) recognize that accountability governance requires the input of various types of expertise and affected communities. We conclude with lessons for assembling cross-expertise consensus for the co-construction of impacts and to build robust accountability relationships.  more » « less
Award ID(s):
1704369
PAR ID:
10283954
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Conference on Fairness, Accountability,and Transparency (FAccT ’21)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Algorithmic impact assessments (AIA) are increasingly being proposed as a mechanism for algorithmic accountability. These assessments are seen as potentially useful for anticipating, avoiding, and mitigating the negative consequences of algorithmic decision-making systems (ADS). At the same time, what an AIA would entail remains under-specified. While promising, AIAs raise as many questions as they answer. Choices about the methods, scope, and purpose of impact assessments structure the possible governance outcomes. Decisions about what type of effects count as an impact, when impacts are assessed, whose interests are considered, who is invited to participate, who conducts the assessment, the public availability of the assessment, and what the outputs of the assessment might be all shape the forms of accountability that AIA proponents seek to encourage. These considerations remain open, and will determine whether and how AIAs can function as a viable governance mechanism in the broader algorithmic accountability toolkit, especially with regard to furthering the public interest. Because AlAs are still an incipient governance strategy, approaching them as social constructions that do not require a single or universal approach offers a chance to produce interventions that emerge from careful deliberation. 
    more » « less
  2. In widely used sociological descriptions of how accountability is structured through institutions, an “actor” (e.g., the developer) is accountable to a “forum” (e.g., regulatory agencies) empowered to pass judgements on and demand changes from the actor or enforce sanctions. However, questions about structuring accountability persist: why and how is a forum compelled to keep making demands of the actor when such demands are called for? To whom is a forum accountable in the performance of its responsibilities, and how can its practices and decisions be contested? In the context of algorithmic accountability, we contend that a robust accountability regime requires a triadic relationship, wherein the forum is also accountable to another entity: the public(s). Typically, as is the case with environmental impact assessments, public(s) make demands upon the forum's judgements and procedures through the courts, thereby establishing a minimum standard of due diligence. However, core challenges relating to: (1) lack of documentation, (2) difficulties in claiming standing, and (3) struggles around admissibility of expert evidence on and achieving consensus over the workings of algorithmic systems in adversarial proceedings prevent the public from approaching the courts when faced with algorithmic harms. In this paper, we demonstrate that the courts are the primary route—and the primary roadblock—in the pursuit of redress for algorithmic harms. Courts often find algorithmic harms non-cognizable and rarely require developers to address material claims of harm. To address the core challenges of taking algorithms to court, we develop a relational approach to algorithmic accountability that emphasizes not what the actors do nor the results of their actions, but rather how interlocking relationships of accountability are constituted in a triadic relationship between actors, forums, and public(s). As is the case in other regulatory domains, we believe that impact assessments (and similar accountability documentation) can provide the grounds for contestation between these parties, but only when that triad is structured such that the public(s) are able to cohere around shared experiences and interests, contest the outcomes of algorithmic systems that affect their lives, and make demands upon the other parties. Where courts now find algorithmic harms non-cognizable, an impact assessment regime can potentially create procedural rights to protect substantive rights of the public(s). This would require algorithmic accountability policies currently under consideration to provide the public(s) with adequate standing in courts, and opportunities to access and contest the actor's documentation and the forum's judgments. 
    more » « less
  3. Exposing students to low-quality assessments such as multiple-choice questions (MCQs) and short answer questions (SAQs) is detrimental to their learning, making it essential to accurately evaluate these assessments. Existing evaluation methods are often challenging to scale and fail to consider their pedagogical value within course materials. Online crowds offer a scalable and cost-effective source of intelligence, but often lack necessary domain expertise. Advancements in Large Language Models (LLMs) offer automation and scalability, but may also lack precise domain knowledge. To explore these trade-offs, we compare the effectiveness and reliability of crowdsourced and LLM-based methods for assessing the quality of 30 MCQs and SAQs across six educational domains using two standardized evaluation rubrics. We analyzed the performance of 84 crowdworkers from Amazon's Mechanical Turk and Prolific, comparing their quality evaluations to those made by the three LLMs: GPT-4, Gemini 1.5 Pro, and Claude 3 Opus. We found that crowdworkers on Prolific consistently delivered the highest-quality assessments, and GPT-4 emerged as the most effective LLM for this task. Our study reveals that while traditional crowdsourced methods often yield more accurate assessments, LLMs can match this accuracy in specific evaluative criteria. These results provide evidence for a hybrid approach to educational content evaluation, integrating the scalability of AI with the nuanced judgment of humans. We offer feasibility considerations in using AI to supplement human judgment in educational assessment. 
    more » « less
  4. Assessing the ecological and economic impacts of non-native species is crucial to providing managers and policymakers with the information necessary to respond effectively. Most non-native species have minimal impacts on the environment in which they are introduced, but a small fraction are highly deleterious. The definition of ‘damaging’ or ‘high-impact’ varies based on the factors determined to be valuable by an individual or group, but interpretations of whether non-native species meet particular definitions can be influenced by the interpreter’s bias or level of expertise, or lack of group consensus. Uncertainty or disagreement about an impact classification may delay or otherwise adversely affect policymaking on management strategies. One way to prevent these issues would be to have a detailed, nine-point impact scale that would leave little room for interpretation and then divide the scale into agreed upon categories, such as low, medium, and high impact. Following a previously conducted, exhaustive search regarding non-native, conifer-specialist insects, the authors independently read the same sources and scored the impact of 41 conifer-specialist insects to determine if any variation among assessors existed when using a detailed impact scale. Each of the authors, who were selected to participate in the working group associated with this study because of their diverse backgrounds, also provided their level of expertise and uncertainty for each insect evaluated. We observed 85% congruence in impact rating among assessors, with 27% of the insects having perfect inter-rater agreement. Variance in assessment peaked in insects with a moderate impact level, perhaps due to ambiguous information or prior assessor perceptions of these specific insect species. The authors also participated in a joint fact-finding discussion of two insects with the most divergent impact scores to isolate potential sources of variation in assessor impact scores. We identified four themes that could be experienced by impact assessors: ambiguous information, discounted details, observed versus potential impact, and prior knowledge. To improve consistency in impact decision-making, we encourage groups to establish a detailed scale that would allow all observed and published impacts to fall under a particular score, provide clear, reproducible guidelines and training, and use consensus-building techniques when necessary. 
    more » « less
  5. This qualitative study draws on interviews and observations with nurses working in a virtual intensive care unit and using algorithms to track patient progress. It overviews how health practitioners navi- gate algorithmic systems to build relationships with other providers and patients, with attention to strategies for accountability and ad- vocacy in virtual healthcare contexts. 
    more » « less