Search for: All records

Award ID contains: 1839201

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A biodiversity dataset graph: DataONE

https://doi.org/10.5281/zenodo.1486279

Poelen, Jorrit H. (January 2018, Zenodo)

{"Abstract":["The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). <\/p>\n\nDataONE is a distributed infrastructure that provides information about earth observation data. This dataset was derived from the DataONE network using Preston [2] between 17 October 2018 and 6 November 2018, resolving 335,213 urls at an average retrieval rate of about 5 seconds per url, or 720 files per hour, resulting in a data gzip compressed tar archive of 837.3 MB . <\/p>\n\nThe archive associates 325,757 unique metadata urls [3] to 202,063 unique ecological metadata files [4]. Also, the DataONE search index was captured to establish provenance of how the dataset descriptors were found and acquired. During the creation of the snapshot (or crawl), 15,389 urls [5], or 4.7% of urls, did not successfully resolve. <\/p>\n\nTo facilitate discovery, the record of the Preston snapshot crawl is included in the preston-ls-* files . There files are derived from the rdf/nquad file with hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f . This file can also be found in the data.tar.gz at data/8c/67/e0/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f/data . For more information about concepts and format, please see [2]. <\/p>\n\nTo extract all EML files from the included Preston archive, first extract the hashes assocated with EML files using:<\/p>\n\ncat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep "hash://" | sort | uniq > eml-hashes.txt<\/p>\n\nextract data.tar.gz using:<\/p>\n\n~/preston-archive$$ tar xzf data.tar.gz <\/p>\n\nthen use Preston to extract each hash using something like:<\/p>\n\n~/preston-archive$$ preston get hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa\n<eml:eml xmlns:eml="eml://ecoinformatics.org/eml-2.1.1" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml_1.1" packageId="doi:10.18739/A24P9Q" system="https://arcticdata.io" scope="system" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.1.1 ~/development/eml/eml.xsd">\n <dataset>\n <alternateIdentifier>urn:x-wmo:md:org.aoncadis.www::d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>\n <alternateIdentifier>d76bc3b5-7b19-11e4-8526-00c0f03d5b7c</alternateIdentifier>\n <title>Airglow Image Data 2011 4 of 5</title>\n...<\/p>\n\nAlternatively, without using Preston, you can extract the data using the naming convention:<\/p>\n\ndata/[x]/[y]/[z]/[hash]/data<\/p>\n\nwhere x is the first 2 characters of the hash, y the second 2 characters, z the third 2 characters, and hash the full sha256 content hash of the EML file.<\/p>\n\nFor example, the hash hash://sha256/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa can be found in the file: data/00/00/2d/00002d0fc9e35a9194da7dd3d8ce25eddee40740533f5af2397d6708542b9baa/data . For more information, see [2].<\/p>\n\nThe intended use of this archive is to facilitate meta-analysis of the DataONE dataset network. <\/p>\n\n[1] DataONE, https://www.dataone.org\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . DataONE was crawled via Preston with "preston update -u https://dataone.org".\n[3] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep -v "hash://" | sort | uniq | wc -l\n[4] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep -v "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep "hash://" | sort | uniq | wc -l\n[5] cat preston-ls.tsv.gz | gunzip | grep "Version" | grep "deeplinker" | grep -v "query/solr" | cut -f1,3 | tr '\\t' '\\n' | grep -v "hash://" | sort | uniq | wc -l<\/p>\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]}
more » « less
Tracking Biodiversity Data (Re)use, https://doi.org/10.5281/zenodo.4707908

Poelen, J. H (April 2021, iDigBio Communications Luncheon, 12 April 2021)
null (Ed.)
Abstract: As the web of biodiversity knowledge continues to grow and become more complex, practical questions arise: How do we publish and review works that use big and complex datasets? How do we keep track of data use across biodiversity data networks? How do we keep our digital data available for the next 50 years? In this iDigBio lunch seminar, Jorrit Poelen works towards answering these questions through use cases taken from Global Biotic Interactions (GloBI, https://globalbioticinteractions.org), Terrestrial Parasite Tracker TCN (TPT, https://parasitetracker.org) and Preston (https://preston.guoda.bio), a biodiversity data tracker.
more » « less
Full Text Available
UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and Biodiversity Dataset Graph

https://doi.org/10.5281/zenodo.5557670

Cheadle Center for Biodiversity and Ecological Restoration, University of (January 2021, Zenodo)

{"Abstract":["A biodiversity dataset graph: UCSB-IZC<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.<\/p>\n\nThis dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] on 2021-10-08 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].<\/p>\n\nThis archive contains 14137 images related to 33730 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.<\/p>\n\nTo retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c\nhash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844\nhash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef\nhash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston [2,3] v0.3.1. --\npreston.jar<\/p>\n\n-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a<\/p>\n\n-- example image and meta-data --\nsample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)\nsample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .\n[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132\n[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c<\/p>"],"Other":["This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation."]}
more » « less
UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC) Data Archive and Biodiversity Dataset Graph

https://doi.org/10.5281/zenodo.5660088

Cheadle Center for Biodiversity and Ecological Restoration, University of (January 2021, Zenodo)

{"Abstract":["A biodiversity dataset graph: UCSB-IZC<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.<\/p>\n\nThis dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].<\/p>\n\nThis archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].<\/p>\n\nThe images were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P ".*depict"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nAnd the occurrences were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P "occurrence/([0-9])+"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.<\/p>\n\nTo retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5557670/files/5660088<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .\n<hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c> <http://purl.org/pav/previousVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c OK CONTENT_PRESENT_VALID_HASH 66438 hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c\nhash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 OK CONTENT_PRESENT_VALID_HASH 4093 hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844\nhash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef OK CONTENT_PRESENT_VALID_HASH 5746 hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef\nhash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b OK CONTENT_PRESENT_VALID_HASH 6147 hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston [2,3] v0.3.1. --\npreston.jar<\/p>\n\n-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a<\/p>\n\n-- example image and meta-data --\nsample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)\nsample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .\n[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132\n[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c<\/p>"],"Other":["This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation."]}
more » « less
Toward reliable biodiversity dataset references

https://doi.org/10.1016/j.ecoinf.2020.101132

Elliott, Michael J.; Poelen, Jorrit H.; Fortes, José A.B. (September 2020, Ecological Informatics)

Full Text Available
Reliable Biodiversity Dataset References, https://doi.org/10.17605/OSF.IO/FTZ9B

Elliott, M. J. (February 2020, iDigBio Communications Luncheon)
null (Ed.)
No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability. From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators' data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable. We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.
more » « less
Full Text Available
A biodiversity dataset graph: Biodiverity Heritage Library (BHL)

https://doi.org/10.5281/zenodo.3849560

Poelen, Jorrit (January 2020, Zenodo)

{"Abstract":["A biodiversity dataset graph: BHL<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.<\/p>\n\nThis dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-19 and 2020-05-09 using "preston update -u https://biodiversitylibrary.org".<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nTo retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3849560/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> <http://purl.org/pav/previousVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> <http://purl.org/pav/previousVersion> <hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> .\n<hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> <http://purl.org/pav/previousVersion> <hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> .\n<hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> <http://purl.org/pav/previousVersion> <hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> .\n<hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> <http://purl.org/pav/previousVersion> <hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> .\n<hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> <http://purl.org/pav/previousVersion> <hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> .\n<hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> <http://purl.org/pav/previousVersion> <hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> .\n<hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> <http://purl.org/pav/previousVersion> <hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> .\n<hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> <http://purl.org/pav/previousVersion> <hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> .\n<hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> <http://purl.org/pav/previousVersion> <hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> .\n<hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1> <http://purl.org/pav/previousVersion> <hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca file:/home/preston/preston-bhl/data/e0/c1/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca OK CONTENT_PRESENT_VALID_HASH 49458087 hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca\nhash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 file:/home/preston/preston-bhl/data/1a/57/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 OK CONTENT_PRESENT_VALID_HASH 25745 hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99\nhash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c file:/home/preston/preston-bhl/data/85/ef/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c OK CONTENT_PRESENT_VALID_HASH 519892 hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c\nhash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 file:/home/preston/preston-bhl/data/25/1e/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 OK CONTENT_PRESENT_VALID_HASH 787414 hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston[2] v0.1.15. --\npreston.jar<\/p>\n\n-- preston archives containing BHL data files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2b1104cb7749e818c9afca78391b2d0099bbb0a32f2b348860a335cd2f8f6800\n4081bc59dff58d63f6a86c623cb770f01e9a355a42495b205bcb538cd526190f\n47a2816f8b5600b24487093adcddfea12434cc4f270f3ab09d9215fbdd546cd2\n6f99a1388823fca745c9e22ac21e2da909a219aa1ace55170fa9248c0276903c\n7ae46d7cd9b5a0f5889ba38bac53c82e591b0bdf8b605f5e48c0dce8fb7b717f\n82903464889fea7c53f53daedf4e41fa31092f82619edeb3415eb2b473f74af3\n9e8c86243df39dd4fe82a3f814710eccf73aa9291d050415408e346fa2b09e70\na8308fbf4530e287927c471d881ce0fc852f16543d46e1ee26f1caba48815f3a\nbcec6df2ea7f74e9a6e2830d0072e6b2fbe65323d9ddb022dd6e1349c23996e2\ncfe47c25ec0210ac73c06b407beb20d9c58355cb15bae427fdc7541870ca2e4e\nf73fc9e70bce8f21f0c96b8ef0903749d8f223f71343ab5a8910968f99c9b8b6<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Biodiversity Heritage Library (BHL, https://biodiversitylibrary.org) accessed from 2019-05-19 to 2020-05-09 with provenance hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .<\/p>\n\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]}
more » « less
A biodiversity dataset graph: DataONE

https://doi.org/10.5281/zenodo.3849494

Poelen, Jorrit H. (January 2020, Zenodo)

{"Abstract":["A biodiversity dataset graph: DataONE<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data.<\/p>\n\nThis dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-11-06 and 2020-05-07 using "preston update -u https://dataone.org".<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nTo retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3849494/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .\n<hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .\n<hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .\n<hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .\n<hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .\n<hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .\n<hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .\n<hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .\n<hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .\n<hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .\n<hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> <http://purl.org/pav/previousVersion> <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> .\n<hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> <http://purl.org/pav/previousVersion> <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> .\n<hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> <http://purl.org/pav/previousVersion> <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> .\n<hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> <http://purl.org/pav/previousVersion> <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> .\n<hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> <http://purl.org/pav/previousVersion> <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> .\n<hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> <http://purl.org/pav/previousVersion> <hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> .\n<hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> <http://purl.org/pav/previousVersion> <hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> .\n<hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> <http://purl.org/pav/previousVersion> <hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> .\n<hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> <http://purl.org/pav/previousVersion> <hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> .\n<hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3> <http://purl.org/pav/previousVersion> <hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 OK CONTENT_PRESENT_VALID_HASH 21580 hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945\nhash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f OK CONTENT_PRESENT_VALID_HASH 2035 hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f\nhash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 OK CONTENT_PRESENT_VALID_HASH 1935 hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53\nhash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 OK CONTENT_PRESENT_VALID_HASH 1553 hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687<\/p>\n\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston[2] v0.1.15. --\npreston.jar<\/p>\n\n-- preston archives containing DataONE data files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf\n2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b\n35eb1e17e2bf3e71212cde35bdb03e8a6545a57483ea3c1633929257b70cf637\n3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7\n66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba\n7466a35e42dea7e2be068060ec0c926f9a8686388ed504ef5c6c990c1ba4e8d0\n81161d9746c2a5823641c436e773fb4508516b055da85f4494b38c545349da39\n8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66\na90eed8d70c54c8e554f2dfde4fceb434eda162d9615d62de96ded2344f88a78\nc33ef5e29100b323412f1f3bc66908c8e01e4f0d1db4ea3685d2fffc47981dd6\nc84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0\nd362d599d72000c4feb464db5a669b12e15fc3ca1a49b1e7d4d6f7d6d5d15411\nd9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d\nda26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db\ne4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55\neb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94\nf13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27\nf493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f\nff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-11-06 to 2020-05-07 with provenance hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .<\/p>\n\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]}
more » « less
A biodiversity dataset graph: GBIF, iDigBio, BioCASe

https://doi.org/10.5281/zenodo.3852671

Poelen, Jorrit H. (January 2020, Zenodo)

{"Abstract":["A biodiversity dataset graph: GBIF, iDigBio, BioCASe<\/p>\n\nThe intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.<\/p>\n\nThis dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2020-05-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org".<\/p>\n\nThis publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nTo retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar ls --remote https://zenodo.org/record/3852671/files > /dev/null<\/p>\n\nOptionally, you can retrieve all associated data (>500GB) files using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3852671/files,https://archive.org/download/biodiversity-dataset-archives/data.zip/data/,https://deeplinker.bio<\/p>\n\nPlease note https://archive.org/download/biodiversity-dataset-archives/data.zip/data/ and https://deeplinker.bio are Preston remotes that provided access to GBIF, iDigBio, BioCASe data files at time of writing (25 May 2020). These remotes can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints. See also https://archive.org/details/biodiversity-dataset-archives .<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history<\/p>\n\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .\n<hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <http://purl.org/pav/previousVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .\n<hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <http://purl.org/pav/previousVersion> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .\n<hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <http://purl.org/pav/previousVersion> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .\n<hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <http://purl.org/pav/previousVersion> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .\n<hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <http://purl.org/pav/previousVersion> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .\n<hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <http://purl.org/pav/previousVersion> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .\n<hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <http://purl.org/pav/previousVersion> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .\n<hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <http://purl.org/pav/previousVersion> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .\n<hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <http://purl.org/pav/previousVersion> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .\n<hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <http://purl.org/pav/previousVersion> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .\n<hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <http://purl.org/pav/previousVersion> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .\n<hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <http://purl.org/pav/previousVersion> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .\n<hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <http://purl.org/pav/previousVersion> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .\n<hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <http://purl.org/pav/previousVersion> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .\n<hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <http://purl.org/pav/previousVersion> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .\n<hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <http://purl.org/pav/previousVersion> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .\n<hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <http://purl.org/pav/previousVersion> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .\n<hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <http://purl.org/pav/previousVersion> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .\n<hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <http://purl.org/pav/previousVersion> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .\n<hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <http://purl.org/pav/previousVersion> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .\n<hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <http://purl.org/pav/previousVersion> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .\n<hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <http://purl.org/pav/previousVersion> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .\n<hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <http://purl.org/pav/previousVersion> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .\n<hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <http://purl.org/pav/previousVersion> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .\n<hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <http://purl.org/pav/previousVersion> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .\n<hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <http://purl.org/pav/previousVersion> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .\n<hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <http://purl.org/pav/previousVersion> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .\n<hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <http://purl.org/pav/previousVersion> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .\n<hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <http://purl.org/pav/previousVersion> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .\n<hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <http://purl.org/pav/previousVersion> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .\n<hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <http://purl.org/pav/previousVersion> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .\n<hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <http://purl.org/pav/previousVersion> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .\n<hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <http://purl.org/pav/previousVersion> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .\n<hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <http://purl.org/pav/previousVersion> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .\n<hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <http://purl.org/pav/previousVersion> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .\n<hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <http://purl.org/pav/previousVersion> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .\n<hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <http://purl.org/pav/previousVersion> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .\n<hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <http://purl.org/pav/previousVersion> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .\n<hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> <http://purl.org/pav/previousVersion> <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> .\n<hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> <http://purl.org/pav/previousVersion> <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> .\n<hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> <http://purl.org/pav/previousVersion> <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> .\n<hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> <http://purl.org/pav/previousVersion> <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> .\n<hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> <http://purl.org/pav/previousVersion> <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> .\n<hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> <http://purl.org/pav/previousVersion> <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> .\n<hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> <http://purl.org/pav/previousVersion> <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> .\n<hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> <http://purl.org/pav/previousVersion> <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> .\n<hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> <http://purl.org/pav/previousVersion> <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> .\n<hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> <http://purl.org/pav/previousVersion> <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> .\n<hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b> <http://purl.org/pav/previousVersion> <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> .<\/p>\n\n\nIf you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$$ java -jar preston.jar verify\nhash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362 file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362 OK CONTENT_PRESENT_VALID_HASH 89931\nhash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b OK CONTENT_PRESENT_VALID_HASH 210344\nhash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02 file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02 OK CONTENT_PRESENT_VALID_HASH 210344\nhash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38 file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38 OK CONTENT_PRESENT_VALID_HASH 202701<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston[2] v0.1.15. --\npreston.jar<\/p>\n\n-- individual provenance index files --<\/p>\n\n049b0eb995b484c1e64184f582f51b3c608dcade70c4aefc2d53f903bae45098\n073315c32d7fd19868449bef1b11b15a86981dee53a31f7f5c882f7e3be413c3\n1172c6927e58113db668409d36b6a2cd84cf1a93e85b50d65d0bd008a5d8aaa4\n1707cb11cd9f696f1a86fd06742c1e14fad856747be88791f79f6fc7c979d5a6\n272ff1f12a573c667634d934d06b8bab0dd9cc6558795287ea99fab87620d005\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2bbbe11bb1932c6c8fbbc2ed16dde182f53c4cecbe0dd4f779c32f527a61bc62\n37b8b636e939072d0df7246bf077ead4279f9dd33929be322e631104b0641308\n3901b6af522d535fb164823704686e72f73b7798a2a64eaeb817134552c69e2c\n395ed0c95a624f8853116442690965acf69151acd6b33cc4fc710f567828f784\n460c14ed0129c1469c9149ed1030cdc133f110fb32048748323982cb88dd7eda\n477b6c4e9ecf5c8cd1b5502e0245c8622fa4b358f6710f97db39b473ed3d8235\n52b7274f5d795e4987964bb1a327dd6d6e4f65870e6a7aac172481d0ba3013d4\n54786bde04751bc31bf38c9e89c010cfee7de91760e1f5f31218ff11acff8a70\n6135b237a49b37b857801836494f2c36bcb1526bdacf001a9d11727fff6bf1f1\n674937568c0572bc2873f502dca2fe691ba230869f0aba73f5938422654c05cc\n69b4d5ca9643c14501a48a2b1eb24971a6da68da5033c304f7f00b94e16a11d9\n6df3363a236d4f026154ef86b34d9672b111333d0c2be179c43db146864f6ed3\n70066ea7c6a9dd6c2193cdc90b3b1ff7664af235ab245f6c03d1dd497b376570\n7084702f8025c99a6608a3355ccad5ff5e644ad544121f5d524961f7fe29ceb6\n7e9934a1fc580c3f591c295306ab364c2e7a589e91590ab6334514e4b5c28062\n7ebb008412baaac3afcc8af68b796bf4ca98f367cfd61a815eee82cdffeab196\n886edb8d22973bb04fe3b42d12106029a00b9deab3fb77d8787123327b77ae3b\n8a2426eb4b38af30c6ee764463b8684e0dec400e4472a2a53e6eedf246dab178\n8a6d7e2ab026ff56380235fd9696f5e538e5e426b9374f2ddf3a705e186a7788\n8d44c9e36a505e5c3f125e1702ef7473280bf5bcfa624fe5d3998694b67e0887\n94290680edef0f8ac81d5d4d5b8b680ba5ce821df17c4de62464429552c3360e\n95f88f27ed3448534206406738dfb5c5030fe3d6883c6dda261649357600883f\n9d12cae409e8ea0a546f7945cc629d622400000c3338e4710d9c6084fca9274d\n9fa9ea50db419c75251026708183add8973d9e68a79062f7808b110bef21006e\na24abbe089556f51fe9c2a51febdcaf893b419556312bcc63515713fc4a52922\na3b0477fe46f09b0f51c0f651691665c149bc341f5c19996675d849252e86453\na486474333f05884580dd10c54c95999063c7d1bc22e2cbe3bead604aca0a183\na524b9af3f172793998e1f9c5c0e9c949cc935624a17ed3364d32bc0391c9382\naa0e508aeb96f240b551fe92ff4224325ddcdf66f97eef95ac78aec62e53a169\nab34300942ec02cca7adf2744f6fbc1ab7587060bea09ef92b65b66f89d1ddcd\nb05d4a17d9a02180669d7eb017102dd1a739fb4615759cba94baf944b2aee29c\nb37c79f95c22fc4d657cc89dedd7a870923285da690ad4f5121962492484a142\nbc699639e5515a5fc9da9d442357cc8a9ff310a177e54f1646e002723de49f1d\nbe6d8cd5f1405a5e3e8aa492fb8dab41f6521608834d746e6cbc58d2f550f918\nc06f4413a97a5540fbdd40bdbfb194435c154533df7fe388dfdd378084e19c3d\nc585b8addfb7f7991ad74c0bae158aecefc6be5b11c28b020135e0f13040e187\nc66587e9730a6f68e961240038892df656ea99a1a25f4ff8ce556c07b09a4878\nca289dce66c8b9955c223fe3e906b8f26c12cf53506cebe651b004961f7964af\ncea1aab236de5de8da8954797d846c225bf2ad4f8fe3cd413e60ab029f9e1b3e\nda05cc27a47e755ebe912fafae434df5bd31a5d92658fe1943acc0a2023fab32\nee473aeda889fd12ac2c76aae06314e5f279cce5f1a736d39bfc097657a82060\nfcb2ee4d630a9a1440417b0c46da5bc1578a388d6aedd12189a23283b60dde7d\nfef548489bd7bea43ae1c2b7755d38a87f4a8b038a466bf7e7b4ac64d665fd62\nff32a7cbc99eaf6b67695fd94284a9b1b47a76497ef4d10ffc4dae199cc0d7c3<\/p>\n\n--- individual provenance logs --<\/p>\n\n05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd\n09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2\n0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781\n102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9\n1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e\n20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17\n24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43\n261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12\n39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c\n3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1\n3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7\n3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee\n480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6\n58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70\n5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7<\/p>"]}
more » « less
Reliable Dataset Identifiers Are Essential Building Blocks For Reproducible Research

Elliott, M.; Poelen, J.H.; Fortes, J.A.B. (January 2020, 4th Annual Digital Data in Biodiversity Research, 1-3 June 2020)

10.17605/OSF.IO/AT4XE Despite increased use of digital biodiversity data in research, reliable methods to identify datasets are not widely adopted. While commonly used location-based dataset identifiers such as URLs help to easily download data today, additional identification schemes are needed to ensure long term access to datasets. We propose to augment existing location- and DOI-based identification schemes with cryptographic content-based identifiers. These content-based identifiers can be calculated from the datasets themselves using available cryptographic hashing algorithms (e.g., sha256). These algorithms take only the digital content as input to generate a unique identifier without needing a centralized identification administration. The use of content-based identifiers is not new, but a re-application of change management techniques used in the popular version control system "git". We show how content-based identifiers can be used to version datasets, to track the dataset locations, to monitor their reliability, and to efficiently detect dataset changes. We discuss the results of using our approach on datasets registered in GBIF and iDigBio from Sept 2018 to May 2020. Also, we propose how reliable, decentralized, dataset indexing and archiving systems can be devised. Lastly, we outline a modification to existing data citation practices to help work towards more reproducible and reusable research workflows.
more » « less
Full Text Available

« Prev Next »