skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Rediscovering the human in AI design for fairness
This paper is an initial report of our fair AI design project by a small research team made up of anthropologists and computer scientists. Our collaborative project was developed in response to the recent debates on AI's ethical and social issues (Elish and boyd 2018). We share this understanding that "numbers don't speak for themselves," but data enters into research projects already "fully cooked" (D'Ignazio and Klein 2020). Therefore, we take an anthropological approach to observing, recording, understanding, and reflecting upon the process of machine learning algorithm design from the first steps of choosing and coding datasets for training and building algorithms. We tease apart the encoding of social-cultural paradigms in the generation and use of datasets in algorithm design and testing. By doing so, we rediscover the human in data to challenge the methodological and social assumptions in data use and then to adjust the model and parameters of our algorithms. This paper centers on tracing the social trajectory of the Correctional Offender Management Profiling for Alternative Sanctions, known as the COMPAS dataset. This dataset contains data of over 10,000 criminal defendants in Broward County in Florida, the U.S. Since its publication, it has become a benchmark dataset in the study of algorithmic fairness and was also used to design and train our algorithm for recidivism prediction. This paper presents our observation that data results from a complex set of social, political, and historical assumptions and circumstances and demonstrates how the social trajectory of data can be taken into the design of AI as automated systems become more intricate into our daily lives.”  more » « less
Award ID(s):
1927564
PAR ID:
10383819
Author(s) / Creator(s):
;
Date Published:
Journal Name:
SSSS Newsletter of the Society for Social Studies of Science
ISSN:
0146-1427
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Artificial intelligence (AI) technologies are widely deployed in smartphone photography; and prompt-based image synthesis models have rapidly become commonplace. In this paper, we describe a Research-through-Design (RtD) project which explores this shift in the means and modes of image production via the creation and use of the Entoptic Field Camera. Entoptic phenomena usually refer to perceptions of floaters or bright blue dots stemming from the physiological interplay of the eye and brain. We use the term entoptic as a metaphor to investigate how the material interplay of data and models in AI technologies shapes human experiences of reality. Through our case study using first-person design and a field study, we offer implications for critical, reflective, more-than-human and ludic design to engage AI technologies; the conceptualisation of an RtD research space which contributes to AI literacy discourses; and outline a research trajectory concerning materiality and design affordances of AI technologies. 
    more » « less
  2. Monitoring and analysis of wildlife are key to conservation planning and conflict management. The widespread use of camera traps coupled with AI-based analysis tools serves as an excellent example of successful and non-invasive use of technology for design, planning, and evaluation of conservation policies. As opposed to the typical use of camera traps that capture still images or short videos, in this project, we propose to analyze longer term videos monitoring a large flock of birds. This project, which is part of the NSF-TIH Indo-US joint R&D partnership, focuses on solving challenges associated with the analysis of long-term videos captured at feeding grounds and nesting sites, among other such locations that host large flocks of migratory birds. We foresee that the objectives of this project would lead to datasets and benchmarking tools as well as novel algorithms that would be instrumental in developing automated video analysis tools that could in turn help understand individual and social behavior of birds. The first of the key outcomes of this research will include the curation of challenging, real-world datasets for benchmarking various image and video analytics algorithms for tasks such as counting, detection, segmentation, and tracking. Our recent efforts towards this outcome is a curated dataset of 812 high-resolution, point-annotated, images (4K - 32MP) of a flock of Demoiselle cranes (Anthropoides virgo) taken from their feeding site at Khichan, Rajasthan, India. The average number of birds in each image is about 207, with a maximum count of 1500. The benchmark experiments show that state-of-the-art vision techniques struggle with tasks such as segmentation, detection, localization, and density estimation for the proposed dataset. Over the execution of this open science research, we will be scaling this dataset for segmentation and tracking in videos, as well as developing novel techniques for video analytics for wildlife monitoring. 
    more » « less
  3. With the rise of AI, algorithms have become better at learning underlying patterns from the training data including ingrained social biases based on gender, race, etc. Deployment of such algorithms to domains such as hiring, healthcare, law enforcement, etc. has raised serious concerns about fairness, accountability, trust and interpretability in machine learning algorithms. To alleviate this problem, we propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases from tabular datasets. It uses a graphical causal model to represent causal relationships among different features in the dataset and as a medium to inject domain knowledge. A user can detect the presence of bias against a group, say females, or a subgroup, say black females, by identifying unfair causal relationships in the causal network and using an array of fairness metrics. Thereafter, the user can mitigate bias by refining the causal model and acting on the unfair causal edges. For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset based on the current causal model while ensuring a minimal change from the original dataset. Users can visually assess the impact of their interactions on different fairness metrics, utility metrics, data distortion, and the underlying data distribution. Once satisfied, they can download the debiased dataset and use it for any downstream application for fairer predictions. We evaluate D-BIAS by conducting experiments on 3 datasets and also a formal user study. We found that D-BIAS helps reduce bias significantly compared to the baseline debiasing approach across different fairness metrics while incurring little data distortion and a small loss in utility. Moreover, our human-in-the-loop based approach significantly outperforms an automated approach on trust, interpretability and accountability. 
    more » « less
  4. Artificial Intelligence (AI) technologies have become increasingly pervasive in our daily lives. Recent breakthroughs such as large language models (LLMs) are being increasingly used globally to enhance their work methods and boost productivity. However, the advent of these technologies has also brought forth new challenges in the critical area of social cybersecurity. While AI has broadened new frontiers in addressing social issues, such as cyberharassment and cyberbullying, it has also worsened existing social issues such as the generation of hateful content, bias, and demographic prejudices. Although the interplay between AI and social cybersecurity has gained much attention from the research community, very few educational materials have been designed to engage students by integrating AI and socially relevant cybersecurity through an interdisciplinary approach. In this paper, we present our newly designed open-learning platform, which can be used to meet the ever-increasing demand for advanced training in the intersection of AI and social cybersecurity. The designed platform, which consists of hands-on labs and education materials, incorporates the latest research results in AI-based social cybersecurity, such as cyberharassment detection, AI bias and prejudice, and adversarial attacks on AI-powered systems, are implemented using Jupyter Notebook, an open-source interactive computing platform for effective hands-on learning. Through a user study of 201 students from two universities, we demonstrate that students have a better understanding of AI-based social cybersecurity issues and mitigation after doing the labs, and they are enthusiastic about learning to use AI algorithms in addressing social cybersecurity challenges for social good. 
    more » « less
  5. Analyzing individual human trajectory data helps our understanding of human mobility and finds many commercial and academic applications. There are two main approaches to accessing trajectory data for research: one involves using real-world datasets like GeoLife, while the other employs simulations to synthesize data. Real-world data provides insights from real human activities, but such data is generally sparse due to voluntary participation. Conversely, simulated data can be more comprehensive but may capture unrealistic human behavior. In this Data and Resource paper, we combine the benefit of both by leveraging the statistical features of real-world data and the comprehensiveness of simulated data. Specifically, we extract features from the real-world GeoLife dataset such as the average number of individual daily trips, average radius of gyration, and maximum and minimum trip distances. We calibrate the Pattern of Life Simulation, a realistic simulation of human mobility, to reproduce these features. Therefore, we use a genetic algorithm to calibrate the parameters of the simulation to mimic the GeoLife features. For this calibration, we simulated numerous random simulation settings, measured the similarity of generated trajectories to GeoLife, and iteratively (over many generations) combined parameter settings of trajectory datasets most similar to GeoLife. Using the calibrated simulation, we simulate large trajectory datasets that we call GeoLife+, where + denotes the Kleene Plus, indicating unlimited replication with at least one occurrence. We provide simulated GeoLife+ data with 182, 1k, and 5k over 5 years, 10k, and 50k over a year and 100k users over 6 months of simulation lifetime. 
    more » « less