skip to main content


Title: Gravity Spy Machine Learning Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b

This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.

There are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data.

For O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

For O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency, Chirp, Extremely_Loud, Fast_Scattering, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle

If you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command

```conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy```

After downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.

```

from gwpy.table import GravitySpyTable

H1_O2 = GravitySpyTable.read('H1_O2.csv')

H1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]

H1_O2[0:4].download(nproc=1)

```

Each of the columns in the CSV files are taken from various different inputs: 

[‘event_time’, ‘ifo’, ‘peak_time’, ‘peak_time_ns’, ‘start_time’, ‘start_time_ns’, ‘duration’, ‘peak_frequency’, ‘central_freq’, ‘bandwidth’, ‘channel’, ‘amplitude’, ‘snr’, ‘q_value’] contain metadata about the signal from the Omicron pipeline. 

[‘gravityspy_id’] is the unique identifier for each glitch in the dataset. 

[‘1400Ripples’, ‘1080Lines’, ‘Air_Compressor’, ‘Blip’, ‘Chirp’, ‘Extremely_Loud’, ‘Helix’, ‘Koi_Fish’, ‘Light_Modulation’, ‘Low_Frequency_Burst’, ‘Low_Frequency_Lines’, ‘No_Glitch’, ‘None_of_the_Above’, ‘Paired_Doves’, ‘Power_Line’, ‘Repeating_Blips’, ‘Scattered_Light’, ‘Scratchy’, ‘Tomte’, ‘Violin_Mode’, ‘Wandering_Line’, ‘Whistle’] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). 

[‘ml_label’, ‘ml_confidence’] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification. 

[‘url1’, ‘url2’, ‘url3’, ‘url4’] are the links to the publicly-available Omega scans for each glitch. ‘url1’ shows the glitch for a duration of 0.5 seconds, ‘url2’ for 1 seconds, ‘url3’ for 2 seconds, and ‘url4’ for 4 seconds.

```

For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. 

 
more » « less
Award ID(s):
2106865 1547880 2106882
NSF-PAR ID:
10347723
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
v1.0.0
Subject(s) / Keyword(s):
["Gravitational Waves","LIGO","Gravity Spy","Citizen Science"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper. 

    When a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are: 

    1. A glitches must be classified by at least 2 registered volunteers
    2. Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class
    3. Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability

    The choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep). 

    The dataset can be read in using e.g. Pandas: 
    ```
    import pandas as pd
    dataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')
    ```
    Each row in the dataframe contains information about a particular glitch in the Gravity Spy dataset. 

    Description of series in dataframe

    • ['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']
      • Machine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity
    • ['ml_confidence', 'ml_label']
      • Highest machine learning confidence score across all classes for a particular glitch, and the class associated with this score
    • ['gravityspy_id', 'id']
      • Unique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)
    • ['retired']
      • Marks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)
    • ['Nclassifications']
      • The total number of classifications performed by registered volunteers on this glitch
    • ['final_score', 'final_label']
      • The final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch
    • ['tracks']
      • Array of classification weights that were added to each glitch category due to each volunteer's classification

     

    ```
    For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo

    For the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.

    For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. 

     
    more » « less
  2. Abstract

    The NeonTreeCrowns dataset is a set of individual level crown estimates for 100 million trees at 37 geographic sites across the United States surveyed by the National Ecological Observation Network’s Airborne Observation Platform. Each rectangular bounding box crown prediction includes height, crown area, and spatial location. 

    How can I see the data?

    A web server to look through predictions is available through idtrees.org

    Dataset Organization

    The shapefiles.zip contains 11,000 shapefiles, each corresponding to a 1km^2 RGB tile from NEON (ID: DP3.30010.001). For example "2019_SOAP_4_302000_4100000_image.shp" are the predictions from "2019_SOAP_4_302000_4100000_image.tif" available from the NEON data portal: https://data.neonscience.org/data-products/explore?search=camera. NEON's file convention refers to the year of data collection (2019), the four letter site code (SOAP), the sampling event (4), and the utm coordinate of the top left corner (302000_4100000). For NEON site abbreviations and utm zones see https://www.neonscience.org/field-sites/field-sites-map. 

    The predictions are also available as a single csv for each file. All available tiles for that site and year are combined into one large site. These data are not projected, but contain the utm coordinates for each bounding box (left, bottom, right, top). For both file types the following fields are available:

    Height: The crown height measured in meters. Crown height is defined as the 99th quartile of all canopy height pixels from a LiDAR height model (ID: DP3.30015.001)

    Area: The crown area in m2 of the rectangular bounding box.

    Label: All data in this release are "Tree".

    Score: The confidence score from the DeepForest deep learning algorithm. The score ranges from 0 (low confidence) to 1 (high confidence)

    How were predictions made?

    The DeepForest algorithm is available as a python package: https://deepforest.readthedocs.io/. Predictions were overlaid on the LiDAR-derived canopy height model. Predictions with heights less than 3m were removed.

    How were predictions validated?

    Please see

    Weinstein, B. G., Marconi, S., Bohlman, S. A., Zare, A., & White, E. P. (2020). Cross-site learning in deep learning RGB tree crown detection. Ecological Informatics56, 101061.

    Weinstein, B., Marconi, S., Aubry-Kientz, M., Vincent, G., Senyondo, H., & White, E. (2020). DeepForest: A Python package for RGB deep learning tree crown delineation. bioRxiv.

    Weinstein, Ben G., et al. "Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks." Remote Sensing 11.11 (2019): 1309.

    Were any sites removed?

    Several sites were removed due to poor NEON data quality. GRSM and PUUM both had lower quality RGB data that made them unsuitable for prediction. NEON surveys are updated annually and we expect future flights to correct these errors. We removed the GUIL puerto rico site due to its very steep topography and poor sunangle during data collection. The DeepForest algorithm responded poorly to predicting crowns in intensely shaded areas where there was very little sun penetration. We are happy to make these data are available upon request.

    # Contact

    We welcome questions, ideas and general inquiries. The data can be used for many applications and we look forward to hearing from you. Contact ben.weinstein@weecology.org. 

    Gordon and Betty Moore Foundation: GBMF4563 
    more » « less
  3. Transient noise, called "glitches," can mimic and obscure real gravitational waves in the strain data channel. One machine learning software package used to classify these glitches and identify their sources, GravitySpy, is successful when the spectrogram of the glitch has a very distinct and unique shape. However, one of the most common types of glitches, called a "blip," has an indistinct shape due to so few cycles being in-band, and tends to ring off template signals of binary black hole mergers, making it especially necessary to eliminate blips for future observing runs. Here we examine blip glitches in a Q-transform spectrogram with different parameters than those used by GravitySpy to determine if there are sub-classifications of blips that might have identifiable sources, and then use Convolutional Neural Networks to sub-classify these blips. The implementation of Convolutional Neural Networks has provided compelling evidence of distinguishable differences between these hypothesized sub-classes. 
    more » « less
  4. Abstract

    The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches.

     
    more » « less
  5. A biodiversity dataset graph: GBIF, iDigBio, BioCASe

    The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.

    This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2020-05-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org".

    This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar ls --remote https://zenodo.org/record/3852671/files > /dev/null

    Optionally, you can retrieve all associated data (>500GB) files using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3852671/files,https://archive.org/download/biodiversity-dataset-archives/data.zip/data/,https://deeplinker.bio

    Please note https://archive.org/download/biodiversity-dataset-archives/data.zip/data/ and https://deeplinker.bio are Preston remotes that provided access to GBIF, iDigBio, BioCASe data files at time of writing (25 May 2020). These remotes can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints. See also https://archive.org/details/biodiversity-dataset-archives .

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history

    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <http://purl.org/pav/previousVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <http://purl.org/pav/previousVersion> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .
    <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <http://purl.org/pav/previousVersion> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .
    <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <http://purl.org/pav/previousVersion> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .
    <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <http://purl.org/pav/previousVersion> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .
    <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <http://purl.org/pav/previousVersion> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .
    <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <http://purl.org/pav/previousVersion> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .
    <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <http://purl.org/pav/previousVersion> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .
    <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <http://purl.org/pav/previousVersion> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .
    <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <http://purl.org/pav/previousVersion> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .
    <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <http://purl.org/pav/previousVersion> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .
    <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <http://purl.org/pav/previousVersion> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .
    <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <http://purl.org/pav/previousVersion> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .
    <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <http://purl.org/pav/previousVersion> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .
    <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <http://purl.org/pav/previousVersion> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .
    <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <http://purl.org/pav/previousVersion> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .
    <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <http://purl.org/pav/previousVersion> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .
    <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <http://purl.org/pav/previousVersion> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .
    <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <http://purl.org/pav/previousVersion> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .
    <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <http://purl.org/pav/previousVersion> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .
    <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <http://purl.org/pav/previousVersion> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .
    <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <http://purl.org/pav/previousVersion> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .
    <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <http://purl.org/pav/previousVersion> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .
    <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <http://purl.org/pav/previousVersion> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .
    <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <http://purl.org/pav/previousVersion> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .
    <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <http://purl.org/pav/previousVersion> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .
    <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <http://purl.org/pav/previousVersion> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .
    <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <http://purl.org/pav/previousVersion> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .
    <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <http://purl.org/pav/previousVersion> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .
    <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <http://purl.org/pav/previousVersion> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .
    <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <http://purl.org/pav/previousVersion> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .
    <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <http://purl.org/pav/previousVersion> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .
    <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <http://purl.org/pav/previousVersion> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .
    <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <http://purl.org/pav/previousVersion> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .
    <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <http://purl.org/pav/previousVersion> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .
    <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <http://purl.org/pav/previousVersion> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .
    <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <http://purl.org/pav/previousVersion> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .
    <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <http://purl.org/pav/previousVersion> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .
    <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> <http://purl.org/pav/previousVersion> <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> .
    <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> <http://purl.org/pav/previousVersion> <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> .
    <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> <http://purl.org/pav/previousVersion> <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> .
    <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> <http://purl.org/pav/previousVersion> <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> .
    <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> <http://purl.org/pav/previousVersion> <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> .
    <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> <http://purl.org/pav/previousVersion> <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> .
    <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> <http://purl.org/pav/previousVersion> <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> .
    <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> <http://purl.org/pav/previousVersion> <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> .
    <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> <http://purl.org/pav/previousVersion> <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> .
    <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> <http://purl.org/pav/previousVersion> <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> .
    <hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b> <http://purl.org/pav/previousVersion> <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> .


    If you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    OK    CONTENT_PRESENT_VALID_HASH    89931
    hash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    OK    CONTENT_PRESENT_VALID_HASH    202701

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston[2] v0.1.15. --
    preston.jar

    -- individual provenance index files --

    049b0eb995b484c1e64184f582f51b3c608dcade70c4aefc2d53f903bae45098
    073315c32d7fd19868449bef1b11b15a86981dee53a31f7f5c882f7e3be413c3
    1172c6927e58113db668409d36b6a2cd84cf1a93e85b50d65d0bd008a5d8aaa4
    1707cb11cd9f696f1a86fd06742c1e14fad856747be88791f79f6fc7c979d5a6
    272ff1f12a573c667634d934d06b8bab0dd9cc6558795287ea99fab87620d005
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2bbbe11bb1932c6c8fbbc2ed16dde182f53c4cecbe0dd4f779c32f527a61bc62
    37b8b636e939072d0df7246bf077ead4279f9dd33929be322e631104b0641308
    3901b6af522d535fb164823704686e72f73b7798a2a64eaeb817134552c69e2c
    395ed0c95a624f8853116442690965acf69151acd6b33cc4fc710f567828f784
    460c14ed0129c1469c9149ed1030cdc133f110fb32048748323982cb88dd7eda
    477b6c4e9ecf5c8cd1b5502e0245c8622fa4b358f6710f97db39b473ed3d8235
    52b7274f5d795e4987964bb1a327dd6d6e4f65870e6a7aac172481d0ba3013d4
    54786bde04751bc31bf38c9e89c010cfee7de91760e1f5f31218ff11acff8a70
    6135b237a49b37b857801836494f2c36bcb1526bdacf001a9d11727fff6bf1f1
    674937568c0572bc2873f502dca2fe691ba230869f0aba73f5938422654c05cc
    69b4d5ca9643c14501a48a2b1eb24971a6da68da5033c304f7f00b94e16a11d9
    6df3363a236d4f026154ef86b34d9672b111333d0c2be179c43db146864f6ed3
    70066ea7c6a9dd6c2193cdc90b3b1ff7664af235ab245f6c03d1dd497b376570
    7084702f8025c99a6608a3355ccad5ff5e644ad544121f5d524961f7fe29ceb6
    7e9934a1fc580c3f591c295306ab364c2e7a589e91590ab6334514e4b5c28062
    7ebb008412baaac3afcc8af68b796bf4ca98f367cfd61a815eee82cdffeab196
    886edb8d22973bb04fe3b42d12106029a00b9deab3fb77d8787123327b77ae3b
    8a2426eb4b38af30c6ee764463b8684e0dec400e4472a2a53e6eedf246dab178
    8a6d7e2ab026ff56380235fd9696f5e538e5e426b9374f2ddf3a705e186a7788
    8d44c9e36a505e5c3f125e1702ef7473280bf5bcfa624fe5d3998694b67e0887
    94290680edef0f8ac81d5d4d5b8b680ba5ce821df17c4de62464429552c3360e
    95f88f27ed3448534206406738dfb5c5030fe3d6883c6dda261649357600883f
    9d12cae409e8ea0a546f7945cc629d622400000c3338e4710d9c6084fca9274d
    9fa9ea50db419c75251026708183add8973d9e68a79062f7808b110bef21006e
    a24abbe089556f51fe9c2a51febdcaf893b419556312bcc63515713fc4a52922
    a3b0477fe46f09b0f51c0f651691665c149bc341f5c19996675d849252e86453
    a486474333f05884580dd10c54c95999063c7d1bc22e2cbe3bead604aca0a183
    a524b9af3f172793998e1f9c5c0e9c949cc935624a17ed3364d32bc0391c9382
    aa0e508aeb96f240b551fe92ff4224325ddcdf66f97eef95ac78aec62e53a169
    ab34300942ec02cca7adf2744f6fbc1ab7587060bea09ef92b65b66f89d1ddcd
    b05d4a17d9a02180669d7eb017102dd1a739fb4615759cba94baf944b2aee29c
    b37c79f95c22fc4d657cc89dedd7a870923285da690ad4f5121962492484a142
    bc699639e5515a5fc9da9d442357cc8a9ff310a177e54f1646e002723de49f1d
    be6d8cd5f1405a5e3e8aa492fb8dab41f6521608834d746e6cbc58d2f550f918
    c06f4413a97a5540fbdd40bdbfb194435c154533df7fe388dfdd378084e19c3d
    c585b8addfb7f7991ad74c0bae158aecefc6be5b11c28b020135e0f13040e187
    c66587e9730a6f68e961240038892df656ea99a1a25f4ff8ce556c07b09a4878
    ca289dce66c8b9955c223fe3e906b8f26c12cf53506cebe651b004961f7964af
    cea1aab236de5de8da8954797d846c225bf2ad4f8fe3cd413e60ab029f9e1b3e
    da05cc27a47e755ebe912fafae434df5bd31a5d92658fe1943acc0a2023fab32
    ee473aeda889fd12ac2c76aae06314e5f279cce5f1a736d39bfc097657a82060
    fcb2ee4d630a9a1440417b0c46da5bc1578a388d6aedd12189a23283b60dde7d
    fef548489bd7bea43ae1c2b7755d38a87f4a8b038a466bf7e7b4ac64d665fd62
    ff32a7cbc99eaf6b67695fd94284a9b1b47a76497ef4d10ffc4dae199cc0d7c3

    --- individual provenance logs --

    05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd
    09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2
    0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781
    102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9
    1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e
    20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17
    24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43
    261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12
    39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c
    3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1
    3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7
    3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee
    480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6
    58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70
    5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7

     
    more » « less