We explore the bi-directional relationship between human and machine learning in citizen science. Theoretically, the study draws on the zone of proximal development (ZPD) concept, which allows us to describe AI augmentation of human learning, human augmentation of machine learning, and how tasks can be designed to facilitate co-learning. The study takes a design-science approach to explore the design, deployment, and evaluations of the Gravity Spy citizen science project. The findings highlight the challenges and opportunities of co-learning, where both humans and machines contribute to each other’s learning and capabilities. The study takes its point of departure in the literature on co-learning and develops a framework for designing projects where humans and machines mutually enhance each other’s learning. The research contributes to the existing literature by developing a dynamic approach to human-AI augmentation, by emphasizing that the ZPD supports ongoing learning for volunteers and keeps machine learning aligned with evolving data. The approach offers potential benefits for project scalability, participant engagement, and automation considerations while acknowledging the importance of tutorials, community access, and expert involvement in supporting learning.
more »
« less
Citizen science frontiers: Efficiency, engagement, and serendipitous discovery with human–machine systems
Citizen science has proved to be a unique and effective tool in helping science and society cope with the ever-growing data rates and volumes that characterize the modern research landscape. It also serves a critical role in engaging the public with research in a direct, authentic fashion and by doing so promotes a better understanding of the processes of science. To take full advantage of the onslaught of data being experienced across the disciplines, it is essential that citizen science platforms leverage the complementary strengths of humans and machines. ThisPerspectivespiece explores the issues encountered in designing human–machine systems optimized for both efficiency and volunteer engagement, while striving to safeguard and encourage opportunities for serendipitous discovery. We discuss case studies from Zooniverse, a large online citizen science platform, and show that combining human and machine classifications can efficiently produce results superior to those of either one alone and how smart task allocation can lead to further efficiencies in the system. While these examples make clear the promise of human–machine integration within an online citizen science system, we then explore in detail how system design choices can inadvertently lower volunteer engagement, create exclusionary practices, and reduce opportunity for serendipitous discovery. Throughout we investigate the tensions that arise when designing a human–machine system serving the dual goals of carrying out research in the most efficient manner possible while empowering a broad community to authentically engage in this research.
more »
« less
- PAR ID:
- 10084954
- Publisher / Repository:
- Proceedings of the National Academy of Sciences
- Date Published:
- Journal Name:
- Proceedings of the National Academy of Sciences
- Volume:
- 116
- Issue:
- 6
- ISSN:
- 0027-8424
- Page Range / eLocation ID:
- p. 1902-1909
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Citizen science and artificial intelligence (AI) complement each other by harnessing the strengths of both human and machine capabilities. Citizen science generates terabytes of raw numerical, text, and image data, the analysis of which requires automated techniques to process in an efficient manner. Conversely, AI computer vision technology can require tens of thousands of images during the training process, and citizen science projects are well suited to provide large libraries of data. Herein, we describe how AI tools are being applied across the GLOBE Observer citizen science data ecosystem, where image recognition algorithms are supporting data ingest processes, protecting user privacy and improving data fidelity. GLOBE citizen science data has been used to develop automated data classification routines that enable information discovery of mosquito larvae and land cover labels. These advances position GLOBE citizen scientist data for discovery and use in environmental and health research, as well as by machine learning scientists working in the general field of GeoAI.more » « less
-
Abstract The bulk of research on citizen science participants is project centric, based on an assumption that volunteers experience a single project. Contrary to this assumption, survey responses (n = 3894) and digital trace data (n = 3649) from volunteers, who collectively engaged in 1126 unique projects, revealed that multiproject participation was the norm. Only 23% of volunteers were singletons (who participated in only one project). The remaining multiproject participants were split evenly between discipline specialists (39%) and discipline spanners (38% joined projects with different disciplinary topics) and unevenly between mode specialists (52%) and mode spanners (25% participated in online and offline projects). Public engagement was narrow: The multiproject participants were eight times more likely to be White and five times more likely to hold advanced degrees than the general population. We propose a volunteer-centric framework that explores how the dynamic accumulation of experiences in a project ecosystem can support broad learning objectives and inclusive citizen science.more » « less
-
Fortson, Lucy; Crowston, Kevin; Kloetzer, Laure; Ponti, Marisa (Ed.)Citizen science has become a valuable and reliable method for interpreting and processing big datasets, and is vital in the era of ever-growing data volumes. However, there are inherent difficulties in the generating labels from citizen scientists, due to the inherent variability between the members of the crowd, leading to variability in the results. Sometimes, this is useful — such as with serendipitous discoveries, which corresponds to rare/unknown classes in the data — but it might also be due to ambiguity between classes. The primary issue is then to distinguish between the intrinsic variability in the dataset and the uncertainty in the citizen scientists’ responses, and leveraging that to extract scientifically useful relationships. In this paper, we explore using a neural network to interpret volunteer confusion across the dataset, to increase the purity of the downstream analysis. We focus on the use of learned features from the network to disentangle feature similarity across the classes, and the ability of the machines’ “attention” in identifying features that lead to confusion. We use data from Jovian Vortex Hunter, a citizen science project to study vortices in Jupiter’s atmosphere, and find that the latent space from the model helps effectively identify different sources of image-level features that lead to low volunteer consensus. Furthermore, the machine’s attention highlights features corresponding to specific classes. This provides meaningful image-level feature-class relationships, which is useful in our analysis for identifying vortex-specific features to better understand vortex evolution mechanisms. Finally, we discuss the applicability of this method to other citizen science projects.more » « less
-
Fortson, Lucy; Crowston, Kevin; Kloetzer, Laure; Ponti, Marisa (Ed.)In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.more » « less
An official website of the United States government
