skip to main content

Title: A biodiversity dataset graph: GBIF, iDigBio, BioCASe
Abstract
<p>A biodiversity dataset graph: GBIF, iDigBio, BioCASe</p> <p>The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection AccessMore>>
Creator(s):
Publisher:
Zenodo
Publication Year:
NSF-PAR ID:
10313006
Version:
0.0.2
Award ID(s):
1839201
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract
    <p>A biodiversity dataset graph: GBIF, iDigBio, BioCASe</p> <p>The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.</p> <p>This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2020-05-02 using &#34;preston update -u https://gbif.org,https://idigbio.org,http://biocase.org&#34;.</p> <p>This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  </p> <p>To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to &#34;clone&#34; this dataset using:</p> <p>$ java -jar preston.jar ls --remote https://zenodo.org/record/3852671/files &gt; /dev/null</p> <p>Optionally, you can retrieve all associated data (&gt;500GB) files using:</p> <p>$ java -jar preston.jar clone --remote https://zenodo.org/record/3852671/files,https://archive.org/download/biodiversity-dataset-archives/data.zip/data/,https://deeplinker.bio</p> <p>Please note https://archive.org/download/biodiversity-dataset-archives/data.zip/data/ and https://deeplinker.bio are Preston remotes that provided access to GBIF, iDigBio, BioCASe data filesMore>>
  2. Abstract
    <p>A biodiversity dataset graph: UCSB-IZC</p> <p>The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.</p> <p>This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track &#34;https://api.gbif.org/v1/occurrence/search/?datasetKey&#61;d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0&#34;].</p> <p>This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].</p> <p>The images were counted using:</p> <p>$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\<br />  | grep -o -P &#34;.*depict&#34;\<br />  | sort\<br />  | uniq\<br />  | wc -l</p> <p>And the occurrences were counted using:</p> <p>$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\<br />  | grep -o -P &#34;occurrence/([0-9])&#43;&#34;\<br />  | sort\<br />  | uniq\<br />  | wc -l</p> <p>The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establishMore>>
  3. Abstract
    <p>A biodiversity dataset graph: DataONE</p> <p>The intended use of this archive is to facilitate (meta-)analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data.</p> <p>This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-11-06 and 2020-05-07 using &#34;preston update -u https://dataone.org&#34;.</p> <p>The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  </p> <p>To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz &gt; preston.tar.gz). Then, extract the archives into a &#34;data&#34; folder. Alternatively, you can use the preston[2] command-line tool to &#34;clone&#34; this dataset using:</p> <p>$ java -jar preston.jar clone --remote https://zenodo.org/record/3849494/files</p> <p>After that, verify the indexMore>>
  4. Abstract
    <p>A biodiversity dataset graph: DataONE</p> <p>The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data. </p> <p>This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-10-18 and 2019-10-03 using &#34;preston update -u https://dataone.org&#34;. </p> <p>The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543).  </p> <p>To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz &gt; preston.tar.gz). Then, extract the archives into a &#34;data&#34; folder. Alternatively, you can use the preston[2] command-line tool to &#34;clone&#34; this dataset using:</p> <p>$ java -jar preston.jar clone --remote https://zenodo.org/record/3483218/files</p> <p>After that, verify the index of theMore>>
  5. Abstract
    <p>A biodiversity dataset graph: BHL</p> <p>The intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.</p> <p>This dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-19 and 2020-05-09 using &#34;preston update -u https://biodiversitylibrary.org&#34;.</p> <p>The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  </p> <p>To retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz &gt; preston.tar.gz). Then, extract the archives into a &#34;data&#34; folder. Alternatively, you can use the preston[2] command-line tool to &#34;clone&#34; this dataset using:</p> <p>$ javaMore>>