skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data
In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.  more » « less
Award ID(s):
2006894
PAR ID:
10569660
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ;
Editor(s):
Fortson, Lucy; Crowston, Kevin; Kloetzer, Laure; Ponti, Marisa
Publisher / Repository:
Ubiquity Press
Date Published:
Journal Name:
Citizen Science: Theory and Practice
Volume:
9
Issue:
1
ISSN:
2057-4991
Page Range / eLocation ID:
40
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Citizen science projects that rely on human computation can attempt to solicit volunteers or use paid microwork platforms such as Amazon Mechanical Turk. To better understand these approaches, this paper analyzes crowdsourced image label data sourced from an environmental justice project looking at wetland loss off the coast of Louisiana. This retrospective analysis identifies key differences between the two populations: while Mechanical Turk workers are accessible, cost-efficient, and rate more images than volunteers (on average), their labels are of lower quality, whereas volunteers can achieve high accuracy with comparably few votes. Volunteer organizations can also interface with the educational or outreach goals of an organization in ways that the limited context of microwork prevents. 
    more » « less
  2. Abstract We present the Citizen Science program Active Asteroids and describe discoveries stemming from our ongoing project. Our NASA Partner program is hosted on the Zooniverse online platform and launched on 2021 August 31, with the goal of engaging the community in the search for active asteroids—asteroids with comet-like tails or comae. We also set out to identify other unusual active solar system objects, such as active Centaurs, active quasi-Hilda asteroids (QHAs), and Jupiter-family comets (JFCs). Active objects are rare in large part because they are difficult to identify, so we ask volunteers to assist us in searching for active bodies in our collection of millions of images of known minor planets. We produced these cutout images with our project pipeline that makes use of publicly available Dark Energy Camera data. Since the project launch, roughly 8300 volunteers have scrutinized some 430,000 images to great effect, which we describe in this work. In total, we have identified previously unknown activity on 15 asteroids, plus one Centaur, that were thought to be asteroidal (i.e., inactive). Of the asteroids, we classify four as active QHAs, seven as JFCs, and four as active asteroids, consisting of one main-belt comet (MBC) and three MBC candidates. We also include our findings concerning known active objects that our program facilitated, an unanticipated avenue of scientific discovery. These include discovering activity occurring during an orbital epoch for which objects were not known to be active, and the reclassification of objects based on our dynamical analyses. 
    more » « less
  3. Abstract The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches. 
    more » « less
  4. ABSTRACT We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r = 23.6 versus r = 22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers designed to improve our sensitivity to mergers and bars. Galaxy Zoo volunteers provide 7.5 million individual classifications over 314 000 galaxies. 140 000 galaxies receive at least 30 classifications, sufficient to accurately measure detailed morphology like bars, and the remainder receive approximately 5. All classifications are used to train an ensemble of Bayesian convolutional neural networks (a state-of-the-art deep learning method) to predict posteriors for the detailed morphology of all 314 000 galaxies. We use active learning to focus our volunteer effort on the galaxies which, if labelled, would be most informative for training our ensemble. When measured against confident volunteer classifications, the trained networks are approximately 99 per cent accurate on every question. Morphology is a fundamental feature of every galaxy; our human and machine classifications are an accurate and detailed resource for understanding how galaxies evolve. 
    more » « less
  5. ABSTRACT Citizen science has helped astronomers comb through large data sets to identify patterns and objects that are not easily found through automated processes. The Milky Way Project (MWP), a citizen science initiative on the Zooniverse platform, presents internet users with infrared (IR) images from Spitzer Space Telescope Galactic plane surveys. MWP volunteers make classification drawings on the images to identify targeted classes of astronomical objects. We present the MWP second data release (DR2) and an updated data reduction pipeline written in python. We aggregate ∼3 million classifications made by MWP volunteers during the years 2012–2017 to produce the DR2 catalogue, which contains 2600 IR bubbles and 599 candidate bow shock driving stars. The reliability of bubble identifications, as assessed by comparison to visual identifications by trained experts and scoring by a machine-learning algorithm, is found to be a significant improvement over DR1. We assess the reliability of IR bow shocks via comparison to expert identifications and the colours of candidate bow shock driving stars in the 2MASS point-source catalogue. We hence identify highly reliable subsets of 1394 DR2 bubbles and 453 bow shock driving stars. Uncertainties on object coordinates and bubble size/shape parameters are included in the DR2 catalogue. Compared with DR1, the DR2 bubbles catalogue provides more accurate shapes and sizes. The DR2 catalogue identifies 311 new bow shock driving star candidates, including three associated with the giant H ii regions NGC 3603 and RCW 49. 
    more » « less