skip to main content

Title: Gravity Spy Machine Learning Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b
<p>This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 isMore>>
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publication Year:
Gravitational Waves LIGO Gravity Spy Citizen Science
Award ID(s):
2106865 1547880 2106882
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract
    <p>This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper. </p> <p>When a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is &#34;retired&#34; from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are: </p> <ol><li>A glitches must be classified by at least 2 registered volunteers</li><li>Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class</li><li>Each volunteer classification (weighted by that volunteer&#39;s confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability</li></ol> <p>The choice of these and otherMore>>
  2. Abstract

    Understanding the noise in gravitational-wave detectors is central to detecting and interpreting gravitational-wave signals. Glitches are transient, non-Gaussian noise features that can have a range of environmental and instrumental origins. The Gravity Spy project uses a machine-learning algorithm to classify glitches based upon their time–frequency morphology. The resulting set of classified glitches can be used as input to detector-characterisation investigations of how to mitigate glitches, or data-analysis studies of how to ameliorate the impact of glitches. Here we present the results of the Gravity Spy analysis of data up to the end of the third observing run of Advanced LIGO. We classify 233981 glitches from LIGO Hanford and 379805 glitches from LIGO Livingston into morphological classes. We find that the distribution of glitches differs between the two LIGO sites. This highlights the potential need for studies of data quality to be individually tailored to each gravitational-wave observatory.

  3. Gravity Spy is a citizen science project that draws on the contributions of both humans and machines to achieve its scientific goals. The system supports the Laser Interferometer Gravitational Observatory (LIGO) by classifying “glitches” that interfere with observations. The system makes three advances on the current state of the art: explicit training for new volunteers, synergy between machine and human classification and support for discovery of new classes of glitch. As well, it provides a platform for human-centred computing research on motivation, learning and collaboration. The system has been launched and is currently in operation.
  4. Abstract
    <p>The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). </p> <p>DataONE is a distributed infrastructure that provides information about earth observation data. This dataset was derived from the DataONE network using Preston [2] between 17 October 2018 and 6 November 2018, resolving 335,213 urls at an average retrieval rate of about 5 seconds per url, or 720 files per hour, resulting in a data gzip compressed tar archive of 837.3 MB .  </p> <p>The archive associates 325,757 unique metadata urls [3] to 202,063 unique ecological metadata files [4]. Also, the DataONE search index was captured to establish provenance of how the dataset descriptors were found and acquired. During the creation of the snapshot (or crawl), 15,389 urls [5], or 4.7% of urls, did not successfully resolve. </p> <p>To facilitate discovery, the record of the Preston snapshot crawl is included in the preston-ls-* files . There files are derived from the rdf/nquad file with hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f . This file can also be found in the data.tar.gz at data/8c/67/e0/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f/data . For more information about concepts and format, please see [2]. </p> <p>To extract all EML files from the included Preston archive, first extract the hashesMore>>
  5. Abstract
    <p>A biodiversity dataset graph: GBIF, iDigBio, BioCASe</p> <p>The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.</p> <p>This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2020-05-02 using &#34;preston update -u,,;.</p> <p>This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit or .  </p> <p>To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to &#34;clone&#34; this dataset using:</p> <p>$ java -jar preston.jar ls --remote &gt; /dev/null</p> <p>Optionally, you can retrieve all associated data (&gt;500GB) files using:</p> <p>$ java -jar preston.jar clone --remote,,</p> <p>Please note and are Preston remotes that provided access to GBIF, iDigBio, BioCASe data filesMore>>