skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets
Computer vision is a data hungry field. Researchers and practitioners who work on human-centric computer vision, like facial recognition, emphasize the necessity of vast amounts of data for more robust and accurate models. Humans are seen as a data resource which can be converted into datasets. The necessity of data has led to a proliferation of gathering data from easily available sources, including public data from the web. Yet the use of public data has significant ethical implications for the human subjects in datasets. We bridge academic conversations on the ethics of using publicly obtained data with concerns about privacy and agency associated with computer vision applications. Specifically, we examine how practices of dataset construction from public data-not only from websites, but also from public settings and public records-make it extremely difficult for human subjects to trace their images as they are collected, converted into datasets, distributed for use, and, in some cases, retracted. We discuss two interconnected barriers current data practices present to providing an ethics of traceability for human subjects: awareness and control. We conclude with key intervention points for enabling traceability for data subjects. We also offer suggestions for an improved ethics of traceability to enable both awareness and control for individual subjects in dataset curation practices.  more » « less
Award ID(s):
1704369 1704303
PAR ID:
10601988
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Association for Computing Machinery (ACM)
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
7
Issue:
CSCW1
ISSN:
2573-0142
Format(s):
Medium: X Size: p. 1-33
Size(s):
p. 1-33
Sponsoring Org:
National Science Foundation
More Like this
  1. Li, Changsheng (Ed.)
    An autonomous household robot passed a self-awareness test in 2015, proving that the cognitive capabilities of robots are heading towards those of humans. While this is a milestone in AI, it raises questions about legal implications. If robots are progressively developing cognition, it is important to discuss whether they are entitled to justice pursuant to conventional notions of human rights. This paper offers a comprehensive discussion of this complex question through cross-disciplinary scholarly sources from computer science, ethics, and law. The computer science perspective dissects hardware and software of robots to unveil whether human behavior can be efficiently replicated. The ethics perspective utilizes insights from robot ethics scholars to help decide whether robots can act morally enough to be endowed with human rights. The legal perspective provides an in-depth discussion of human rights with an emphasis on eligibility. The article concludes with recommendations including open research issues. 
    more » « less
  2. The study of complex human interactions and group activities has become a focal point in human-centric computer vision. However, progress in related tasks is often hindered by the challenges of obtaining large-scale labeled datasets from real-world scenarios. To address the limitation, we introduce M3Act, a synthetic data generator for multi-view multi-group multi-person human atomic actions and group activities. Powered by Unity Engine, M3Act features multiple semantic groups, highly diverse and photorealistic images, and a comprehensive set of annotations, which facilitates the learning of human-centered tasks across singleperson, multi-person, and multi-group conditions. We demonstrate the advantages of M3Act across three core experiments. The results suggest our synthetic dataset can significantly improve the performance of several downstream methods and replace real-world datasets to reduce cost. Notably, M3Act improves the state-of-the-art MOTRv2 on DanceTrack dataset, leading to a hop on the leaderboard from 10th to 2nd place. Moreover, M3Act opens new research for controllable 3D group activity generation. We define multiple metrics and propose a competitive baseline for the novel task. Our code and data are available at our project page: http://cjerry1243.github.io/M3Act. 
    more » « less
  3. This research investigated human performance in response to task demands that may be used to convey information about the human to an artificial agent. We performed an experiment with a dynamic time-sharing task to investigate participants development of temporal awareness of the task event unfolding in time. Temporal awareness as an extension, or a special case, of situation awareness, may provide for useful measures of covert mental models applicable to numerous tasks and for input to human-aware AI agents. Temporal awareness measures may be used to classify human performance into the control modes in the contextual control model (COCOM): scrambled, opportunistic, tactical, and strategic. Twenty-one participants participated in a within subjects experiment with an abstract task of resetting four independent timers within their respective windows of opportunity. The results show that temporal measures of task performance are sensitive to changes in task disruptions and difficulty and therefore have promise for human-aware AI. 
    more » « less
  4. We present a virtual reality (VR) framework for the analysis of whole human body surface area. Usual methods for determining the whole body surface area (WBSA) are based on well known formulae, characterized by large errors when the subject is obese, or belongs to certain subgroups. For these situations, we believe that a computer vision approach can overcome these problems and provide a better estimate of this important body indicator. Unfortunately, using machine learning techniques to design a computer vision system able to provide a new body indicator that goes beyond the use of only body weight and height, entails a long and expensive data acquisition process. A more viable solution is to use a dataset composed of virtual subjects. Generating a virtual dataset allowed us to build a pop- ulation with different characteristics (obese, underweight, age, gender). However, synthetic data might differ from a real scenario, typical of the physician’s clinic. For this reason we develop a new virtual environment to facilitate the analysis of human subjects in 3D. This framework can simulate the acquisition process of a real camera, making it easy to analyze and to create training data for machine learning algorithms. With this virtual environment, we can easily simulate the real setup of a clinic, where a subject is standing in front of a cam- era, or may assume a different pose with respect to the camera. We use this newly desig- nated environment to analyze the whole body surface area (WBSA). In particular, we show that we can obtain accurate WBSA estimations with just one view, virtually enabling the pos- sibility to use inexpensive depth sensors (e.g., the Kinect) for large scale quantification of the WBSA from a single view 3D map. 
    more » « less
  5. In this paper, we introduce a creative pipeline to incorporate physiological and behavioral data from contemporary marine mammal research into data-driven animations, leveraging functionality from industry tools and custom scripts to promote scientific insights, public awareness, and conservation outcomes. Our framework can flexibly transform data describing animals’ orientation, position, heart rate, and swimming stroke rate to control the position, rotation, and behavior of 3D models, to render animations, and to drive data sonification. Additionally, we explore the challenges of unifying disparate datasets gathered by an interdisciplinary team of researchers, and outline our design process for creating meaningful data visualization tools and animations. As part of our pipeline, we clean and process raw acceleration and electrophysiological signals to expedite complex multi-stream data analysis and the identification of critical foraging and escape behaviors. We provide details about four animation projects illustrating marine mammal datasets. These animations, commissioned by scientists to achieve outreach and conservation outcomes, have successfully increased the reach and engagement of the scientific projects they describe. These impactful visualizations help scientists identify behavioral responses to disturbance, increase public awareness of human-caused disturbance, and help build momentum for targeted conservation efforts backed by scientific evidence. 
    more » « less