skip to main content


Search for: All records

Creators/Authors contains: "Poelen, Jorrit H."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Last modified: July 3, 2024 IntroductionThis dataset comprises all bee interactions indexed by Global Biotic Interactions (GloBI; Poelen et al. 2014). It is published quarterly by the Big Bee Project (Seltmann et al. 2021) to summarize all available knowledge about bee interactions from natural history collections, community science observations (i.e., iNaturalist), and the literature. Interactions include flower visitation, parasitic interactions (mite, viral), lecty, and many others. Data DescriptionPlease see the [integration process page](https://www.globalbioticinteractions.org/process) to better understand how Global Biotic Interactions combines datasets from various sources. The complete interaction dataset for all species can be accessed via https://www.globalbioticinteractions.org/data. Data is filtered for unique records based on the interaction description and source citation. Archives contain full data records and unique filtered records in tab-delimited format. Dataset column name definitions https://api.globalbioticinteractions.org/interactionFields or https://api.globalbioticinteractions.org/interactionFields Duplicate records occur in the database because more than one provider shares information. This is most frequently occuring in museum specimen data and duplicates can be identified evaluating the institutionCode, collectionCode and catalogNumber fields. The file catalogNumber_counts.tsv groups records by these three fields for this dataset, but does not filter out duplicate records. Additionally, this dataset includes the citation information provided by the data publisher. The provided sourceCitation may not include information about the primary provider (often the natural history collection) the specimen data originates and the catalogNumber should be referenced to understand the original source of the data. Summary statistics about the dataset can be found in the bees-only-review.pdf file. This review of all bee data indexed by Global Biotic Interactions was created using GloBI’s Interaction Data Review Report Framework via repository https://github.com/Big-Bee-Network/select-bee-interactions.sh. Metrics Date Total bee records 07-17-2020 232,906 01-24-2021 257,738 11-17-2021 226,160 06-01-2022 286,818 11-07-2022 429,308 01-18-2024 842,819 07-03-2024 1,109,057   Date Andrenidae Apidae Colletidae Halictidae 07-17-2020 73,463 106,222 20,821 58,880 01-24-2021 77,824 120,919 21,376 63,945 11-17-2021 25,535 134,517 10,568 43,070 06-01-2022 78,016 144,827 20,409 64,054 11-07-2022 84,172 171,378 30,792 79,155 01-18-2024 166,473 334,224 63,847 171,931 07-03-2024 289,400 371,953 83,337 190,562   Date Megachilidae Melittidae Stenotritidae 07-17-2020 44,449 2,511 23 01-24-2021 48,856 2,624 18 11-17-2021 37,001 995 9 06-01-2022 54,516 2,994 18 11-07-2022 61,391 2,396 24 01-18-2024 100,814 5,088 442 07-03-2024 162,587 4,964 438   Included Resources count sourceCitation 219440 Symbiota Collections of Arthropods Network (SCAN) 156437 University of Kansas Natural History Museum 150780 Digital Bee Collections Network, 2014 (and updates). Version: 2015-03-18. National Science Foundation grant DBI#0956388 134657 USGS Biodiversity Information Serving Our Nation (BISON) IPT 126820 http://iNaturalist.org is a place where you can record what you see in nature, meet other nature lovers, and learn about the natural world. 44522 PaDIL Bee records from the Pests and Diseases Image Library, http://www.padil.gov.au. 38658 University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. 27711 Carril OM, Griswold T, Haefner J, Wilson JS. (2018) Wild bees of Grand Staircase-Escalante National Monument: richness, abundance, and spatio-temporal beta-diversity. PeerJ 6:e5867 https://doi.org/10.7717/peerj.5867 15506 Seltmann, K., Van Wagner, J., Behm, R., Brown, Z., Tan, E., & Liu, K. (2020). BID: A project to share biotic interaction and ecological trait data about bees (Hymenoptera: Anthophila). UC Santa Barbara: Cheadle Center for Biodiversity and Ecological Restoration. Retrieved from https://escholarship.org/uc/item/1g21k7bf 14666 Web of Life. http://www.web-of-life.es . 14577 Pensoft Darwin Core Archives available via Integrated Publication Toolkit 13447 University of Colorado Museum of Natural History Entomology Collection 13296 https://mangal.io - the ecological interaction database. 10705 National Database Plant Pollinators. Center for Plant Conservation at San Diego Zoo Global. Accessed via https://saveplants.org/national-collection/pollinator-search/ on 2020-06-05. 8529 Ollerton, J., Trunschke, J. ., Havens, K. ., Landaverde-González, P. ., Keller, A. ., Gilpin, A.-M. ., Rodrigo Rech, A. ., Baronio, G. J. ., Phillips, B. J., Mackin, C. ., Stanley, D. A., Treanore, E. ., Baker, E. ., Rotheray, E. L., Erickson, E. ., Fornoff, F. ., Brearley, F. Q. ., Ballantyne, G. ., Iossa, G. ., Stone, G. N., Bartomeus, I. ., Stockan, J. A., Leguizamón, J., Prendergast, K. ., Rowley, L., Giovanetti, M., de Oliveira Bueno, R., Wesselingh, R. A., Mallinger, R., Edmondson, S., Howard, S. R., Leonhardt, S. D., Rojas-Nossa, S. V., Brett, M., Joaqui, T., Antoniazzi, R., Burton, V. J., Feng, H.-H., Tian, Z.-X., Xu, Q., Zhang, C., Shi, C.-L., Huang, S.-Q., Cole, L. J., Bendifallah, L., Ellis, E. E., Hegland, S. J., Straffon Díaz, S., Lander, T. A. ., Mayr, A. V., Dawson, R. ., Eeraerts, M. ., Armbruster, W. S. ., Walton, B. ., Adjlane, N. ., Falk, S. ., Mata, L. ., Goncalves Geiger, A. ., Carvell, C. ., Wallace, C. ., Ratto, F. ., Barberis, M. ., Kahane, F. ., Connop, S. ., Stip, A. ., Sigrist, M. R. ., Vereecken, N. J. ., Klein, A.-M., Baldock, K. ., & Arnold, S. E. J. . (2022). Pollinator-flower interactions in gardens during the COVID-19 pandemic lockdown of 2020. Journal of Pollination Ecology, 31, 87–96. https://doi.org/10.26786/1920-7603(2022)695 8014 Redhead, J.W.; Coombes, C.F.; Dean, H.J.; Dyer, R.; Oliver, T.H.; Pocock, M.J.O.; Rorke, S.L.; Vanbergen, A.J.; Woodcock, B.A.; Pywell, R.F. (2018). Plant-pollinator interactions database for construction of potential networks. NERC Environmental Information Data Centre. https://doi.org/10.5285/6d8d5cb5-bd54-4da7-903a-15bd4bbd531b 7630 CaraDonna, P.J. 2020. Temporal variation in plant-pollinator interactions, Rocky Mountain Biological Laboratory, CO, USA, 2013 - 2015 ver 1. Environmental Data Initiative. https://doi.org/10.6073/pasta/27dc02fe1655e3896f20326fed5cb95f (Accessed 2021-04-16). 6921 Purdue Entomological Research Collection 6911 Arizona State University Hasbrouck Insect Collection 6430 LaManna, JA, Burkle, LA, Belote, RT, Myers, JA. Biotic and abiotic drivers of plant–pollinator community assembly across wildfire gradients. J Ecol. 2020; 00: 1– 14. https://doi.org/10.1111/1365-2745.13530 . 6288 Pensoft Darwin Core Archives with associateTaxa columns 6269 Eardley C, Coetzer W. 2016. Catalogue of Afrotropical Bees. 6114 University of Michigan Museum of Zoology, Division of Insects 5089 Magrach, Ainhoa et al. (2017), Data from: Plant-pollinator networks in semi-natural grasslands are resistant to the loss of pollinators during blooming of mass-flowering crops, Dryad, Dataset, https://doi.org/10.5061/dryad.k0q1n 3860 Giselle Muschett & Francisco E. Fontúrbel. 2021. A comprehensive catalogue of plant – pollinator interactions for Chile 3720 Frost Entomological Museum, Pennsylvania State University 3670 Natural History Collections managed by Arctos (https://arctosdb.org) accessed via https://vertnet.org . 3620 Sarah E Miller. 6/19/2015. Species associations manually extracted from datasets https://www.nceas.ucsb.edu/interactionweb/resources.html. 3581 Robert L. Minckley San Bernardino Valley from the year 2000 to 2011. 3581 University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC 3581 University of New Hampshire Donald S. Chandler Entomological Collection 2242 Sarah E. Miller. 07/06/2017. Information extracted from dataset https://www.idigbio.org/portal/recordsets/db4bb0df-8539-4617-ab5f-eb118aa3126b. 2223 Bartomeus, Ignasi (2013): Plant-Pollinator Network Data. figshare. Dataset. https://doi.org/10.6084/m9.figshare.154863.v1 2110 Illinois Natural History Survey Insect Collection 2074 Florida State Collection of Arthropods 2035 Ed Baker; Ian J. Kitching; George W. Beccaloni; Amoret Whitaker et al. (2016). Dataset: NHM Interactions Bank. Natural History Museum Data Portal (data.nhm.ac.uk). https://doi.org/10.5519/0060767 1762 Poelen, Jorrit H. (2023). A biodiversity dataset graph: Biological Associations in TaxonWorks hash://sha256/a4d651aac5220487835e6178511886e98b845b2d98cb7c5447fb2b042e0654d2 hash://md5/849edbe55e31e54ea5cdaba0188c5655 (0.2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8253729 1681 Harvard University M, Morris P J (2021). Museum of Comparative Zoology, Harvard University. Museum of Comparative Zoology, Harvard University. 1563 Ballantyne, Gavin; Baldock, Katherine C. R.; Willmer, Pat G. (2015), Data from: Constructing more informative plant-pollinator networks: visitation and pollen deposition networks in a heathland plant community, Dryad, Dataset, https://doi.org/10.5061/dryad.17pp3 1365 Sarah E Miller. 5/30/2016. Interations from various papers. 1281 Sarah E Miller. 4/18/2016. Species associations from Wardeh, M. et al. Database of host-pathogen and related species interactions, and their global distribution. Sci. Data 2:150049 doi: 10.1038/sdata.2015.49 (2015) 1102 University of California Santa Barbara Invertebrate Zoology Collection 1086 Cohen JM, Sauer EL, Santiago O, Spencer S, Rohr JR. 2020. Divergent impacts of warming weather on wildlife disease risk across climates. Science. doi:10.1126/science.abb1702 939 Allen Hurlbert. 2017. Avian Diet Database. 918 Texas A&M University Insect Collection 906 Del Risco, A.A., Montoya, Á.M., García, V. et al. Data synthesis and dynamic visualization converge into a comprehensive biotic interaction network: a case study of the urban and rural areas of Bogotá D.C.. Urban Ecosyst (2021). https://doi.org/10.1007/s11252-021-01133-3 872 Cristina Preda and Quentin Groom. 2014. Species associations manually extracted from literature. 754 United States Geological Survey (USGS) Pollinator Library. https://www.npwrc.usgs.gov/pollinator. 752 Sarah E Miller. 6/22/2015. Species associations manually extracted from datasets https://www.nceas.ucsb.edu/interactionweb/resources.html. 750 RCPol: Online Pollen Catalogs Network. 2016. https://rcpol.org.br/ 744 Classen, Alice; Steffan-Dewenter, Ingolf (2020): Plant-pollinator interactions along an elevational gradient on Mt. Kilimanjaro. PANGAEA, https://doi.org/10.1594/PANGAEA.911390 704 Yale University Peabody Museum Collections Data Portal 677 The Albert J. Cook Arthropod Research Collection 541 Udy, Kristy; Reininghaus, Hannah; Scherber, Christoph; Tscharntke, Teja (2020), Data from: Plant-pollinator interactions along an urbanization gradient from cities and villages to farmland landscapes, Dryad, Dataset, https://doi.org/10.5061/dryad.4mw6m906s 524 Pardee, G.L., Ballare, K.M., Neff, J.L., Do, L.Q., Ojeda, D., Bienenstock, E.J., Brosi, B.J., Grubesic, T.H., Miller, J.A., Tong, D. and Jha, S., 2023. Local and Landscape Factors Influence Plant-Pollinator Networks and Bee Foraging Behavior across an Urban Corridor. Land, 12(2), p.362. https://www.mdpi.com/2073-445X/12/2/362 511 Sarah E Miller. 6/25/2015. Species associations manually extracted from Robertson, C. 1929. Flowers and insects: lists of visitors to four hundred and fifty-three flowers. Carlinville, IL, USA, C. Robertson. 511 The International Barcode of Life Consortium (2016). International Barcode of Life project (iBOL). Occurrence dataset https://doi.org/10.15468/inygc6 454 Seltzer, Carrie; Wysocki, William; Palacios, Melissa; Eickhoff, Anna; Pilla, Hannah; Aungst, Jordan; Mercer, Aaron; Quicho, Jamie; Voss, Neil; Xu, Man; J. Ndangalasi, Henry; C. Lovett, Jon; J. Cordeiro, Norbert (2015): Plant-animal interactions from Africa. figshare. https://dx.doi.org/10.6084/m9.figshare.1526128 342 Mycology Collections Data Portal (MyCoPortal). 2020. https://mycoportal.org 292 Global Web Database (http://globalwebdb.com): an online collection of food webs. Accessed via https://www.globalwebdb.com/Service/DownloadArchive on 2017-10-12. 268 University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection 241 University of Hawaii Insect Museum 168 Sarah E Miller. 12/13/2016. Species associations manually extracted from Onstad, D.W. EDWIP: Ecological Database of the World's Insect Pathogens. Champaign, Illinois: Illinois Natural History Survey, [23/11/2016]. http://insectweb.inhs.uiuc.edu/Pathogens/EDWIP. 153 California Academy of Sciences Entomology and Entomology Type Collection 127 Olito, Colin; Fox, Jeremy W. (2015), Data from: Species traits and abundances predict metrics of plant–pollinator network structure, but not pairwise interactions, Dryad, Dataset, https://doi.org/10.5061/dryad.7st32 114 Kari Lintulaakso. 2023. MammalBase Diet Database. 106 Brose, U. (2018). GlobAL daTabasE of traits and food Web Architecture (GATEWAy) version 1.0 [Data set]. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig. https://doi.org/10.25829/IDIV.283-3-756 104 Groom, Q.J., Maarten De Groot, M. & Marčiulynienė, D. (2020) Species interation data manually extracted from literature for species . 96 Eneida L. Hatcher, Sergey A. Zhdanov, Yiming Bao, Olga Blinkova, Eric P. Nawrocki, Yuri Ostapchuck, Alejandro A. Schäffer, J. Rodney Brister, Virus Variation Resource – improved response to emergent viral outbreaks, Nucleic Acids Research, Volume 45, Issue D1, January 2017, Pages D482–D490, https://doi.org/10.1093/nar/gkw1065 . 93 Jakovos Demetriou and Quentin Groom 2014. Species associations of Sceliphron manually extracted from literature. 92 San Diego Natural History Museum 80 Price Institute of Parasite Research, School of Biological Sciences, University of Utah 59 National Museum of Natural History, Smithsonian Institution IPT RSS Feed 56 Poelen, JH (2016). Plant pathogen-host interactions scraped from Common Names of Plant Diseases published by the American Phytopathological Society at http://www.apsnet.org/publications/commonnames/Pages/default.aspx using Samara, a Planteome (http://planteome.org) plant-trait scraper. 50 Florez-Montero, G.L., Muylaert, R.L., Nogueira, M.R., Geiselman, C., Santana, S.E., Stevens, R.D., Tschapka, M., Rodrigues, F.A. and Mello, M.A.R. (2022), NeoBat Interactions: A data set of bat–plant interactions in the Neotropics. Ecology. Accepted Author Manuscript e3640. https://doi.org/10.1002/ecy.3640 50 Ferrer-Paris, José R.; Sánchez-Mercado, Ada Y.; Lozano, Cecilia; Zambrano, Liset; Soto, José; Baettig, Jessica; Leal, María (2014): A compilation of larval host-plant records for six families of butterflies (Lepidoptera: Papilionoidea) from available electronic resources. figshare. http://dx.doi.org/10.6084/m9.figshare.1168861 39 Pocock, Michael J. O.; Evans, Darren M.; Memmott, Jane (2012), Data from: The robustness and restoration of a network of ecological networks, Dryad, Dataset, https://doi.org/10.5061/dryad.3s36r118 37 Sarah E Miller. 9/19/2016. Species associations extracted from Graystock, P., Blane, E.J., McFrederick, Q.S., Goulson, D. and Hughes, W.O., 2016. Do managed bees drive parasite spread and emergence in wild bees?. International Journal for Parasitology: Parasites and Wildlife, 5(1), pp.64-75. 36 Mihara, T., Nishimura, Y., Shimizu, Y., Nishiyama, H., Yoshikawa, G., Uehara, H., Hingamp, P., Goto, S., and Ogata, H.; Linking virus genomes with host taxonomy. Viruses 8, 66 doi:10.3390/v8030066 (2016). 36 Quentin J. Groom. 2020. Species interactions of species on the List of invasive alien species of Union concern 33 IPBES. (2016). The assessment report of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services on pollinators, pollination and food production. Table 2.4.3 p88 Zenodo. https://doi.org/10.5281/zenodo.3402857 30 Brigham Young University Arthropod Museum 24 Geiselman, Cullen K. & Sarah Younger. 2020. Bat Eco-Interactions Database. www.batbase.org 24 Geiselman, Cullen K. and Tuli I. Defex. 2015. Bat Eco-Interactions Database. www.batplant.org 23 Agosti, Donat. 2020. Transcription of Linné, C. von, 1758. Systema naturae per regna tria naturae secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis. Available at: http://dx.doi.org/10.5962/bhl.title.542 . 21 Species Connect. https://speciesconnect.com 17 http://invertebrates.si.edu/parasites.htm 14 Gandhi, K. J. K., & Herms, D. A. (2009). North American arthropods at risk due to widespread Fraxinus mortality caused by the Alien Emerald ash borer. Biological Invasions, 12(6), 1839–1846. doi:10.1007/s10530-009-9594-1. 12 Food Webs and Species Interactions in the Biodiversity of UK and Ireland (Online). 2017. Data provided by Malcolm Storey. Also available from http://bioinfo.org.uk. 12 Sarah E Miller. 5/28/2015. Arnaud, Paul Henri. A Host-parasite Catalog of North American Tachinidae (Diptera). Washington, D.C.: U.S. Dept. of Agriculture, Science and Education Administration, 1978. 10 University of California Santa Barbara Herbarium 9 Field Museum of Natural History IPT 8 Brose, U. et al., 2005. Body sizes of consumers and their resources. Ecology, 86(9), pp.2545–2545. Available at: http://dx.doi.org/10.1890/05-0379. 8 Strong, Justin S., and Shawn J. Leroux. 2014. "Impact of Non-Native Terrestrial Mammals on the Structure of the Terrestrial Mammal Food Web of Newfoundland, Canada." PLOS ONE 9 (8): e106264. https://doi.org/10.1371/journal.pone.0106264 7 Chen L, Liu B, Wu Z, Jin Q, Yang J, 2017. DRodVir: A resource for exploring the virome diversity in rodents. J Genet Genomics. 44(5):259-264. 5 Froese, R. and D. Pauly. Editors. 2018. FishBase. World Wide Web electronic publication. www.fishbase.org, version (10/2018). 5 Pinnegar, J.K. (2014). DAPSTOM - An Integrated Database & Portal for Fish Stomach Records. Version 4.7. Centre for Environment, Fisheries & Aquaculture Science, Lowestoft, UK. February 2014, 39pp. 4 Aja Sherman, Cullen Geiselman. 2021. Bat Co-Roosting Database 4 Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology 4 Mollentze, Nardus, & Streicker, Daniel G. (2019). Viral zoonotic risk is homogenous among taxonomic orders of mammalian and avian reservoir hosts (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.3516613 4 Sarah E Miller. 7/7/2016. Text gathered from Wirta, H.K., Vesterinen, E.J., Hambäck, P.A., Weingartner, E., Rasmussen, C., Reneerkens, J., Schmidt, N.M., Gilg, O. and Roslin, T., 2015. Exposing the structure of an Arctic food web. Ecology and evolution, 5(17), pp.3842-3856. 4 Sarah E Miller. 9/15/2016. Species associations extracted from http://parasiticplants.siu.edu/index.html. 4 Sarah E. Miller. 04/14/2015. Extracted from literature Scott, J.A. 1986.  The Butterflies of North America.  Stanford University Press, Stanford, CA 4 Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. 2 Deans, Andrew (2021). Catalog of Rose Gall, Herb Gall, and Inquiline Gall Wasps (Hymenoptera: Cynipidae) of the United States, Canada, and Mexico 2 Jorrit H. Poelen. 2017. Species interactions associated with known species interaction datasets. 2 Museum for Southwestern Biology (MSB) Parasite Collection 2 Sarah E Miller. 4/20/2015. Species associations manually extracted from various papers and articles from site https://repository.si.edu 2 Seltmann, Katja C. 2020. Biotic species interactions about ticks manually extracted from literature. 2 Species Interactions of Australia Database (SIAD): Helping us to understand species interactions in Australia and beyond. http://www.discoverlife.org/siad/ . 1 Chen L, Liu B, Yang J, Jin Q, 2014. DBatVir: the database of bat-associated viruses. Database (Oxford). 2014:bau021. doi:10.1093/database/bau021 1 Grundler MC (2020) SquamataBase: a natural history database and R package for comparative biology of snake feeding habits. Biodiversity Data Journal 8: e49943. https://doi.org/10.3897/BDJ.8.e49943 1 Gunther KA et al. 2014 Dietary breadth of grizzly bears in the Greater Yellowstone Ecosystem. Ursus 25(1):60-72 1 Sarah E Miller. 7/6/2016. Arctos collection. Included files bee_data_BID.sh - script for separating bee records into family uniq_citations.tsv - list of unique citations indicating bee interactions Andrenidae_data_unique.tsv - Andrenidae records     Apidae_data_unique.tsv - Apidae records         Colletidae_data_unique.tsv - Colletidae records Halictidae_data_unique.tsv - Halictidae records     Megachilidae_data_unique.tsv - Megachilidae records     Melittidae_data_unique.tsv - Melittidae records Stenotritidae_data_unique.tsv - Stenotritidae records bees-only-interactions.tsv.zip - list of all bee interaction data indexed on Global Biotic Interactions from GloBI version 2024-06-07 produced by https://github.com/Big-Bee-Network/select-bee-interactions.sh bees-only-review.pdf - Review of all bee data indexed by Global Biotic Interactions using GloBI’s Interaction Data Review Report Framework via repository https://github.com/Big-Bee-Network/select-bee-interactions.sh catalogNumber_counts.tsv - counts by catalogNumber in dataset. Duplicate catalog numbers indicate duplicated data shared by multiple data providers. ReferencesGloBI Community. (2024). Global Biotic Interactions: Interpreted Data Products hash://md5/946f7666667d60657dc89d9af8ffb909 hash://sha256/4e83d2daee05a4fa91819d58259ee58ffc5a29ec37aa7e84fd5ffbb2f92aa5b8 (0.7) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.11552565. Poelen JH, Simons JD, Mungall CJ (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005 Seltmann KC, Allen J, Brown BV, Carper A, Engel MS, Franz N, Gilbert E, Grinter C, Gonzalez VH, Horsley P, Lee S, Maier C, Miko I, Morris P, Oboyski P, Pierce NE, Poelen J, Scott VL, Smith M, Talamas EJ, Tsutsui ND, Tucker E (2021) Announcing Big-Bee: An initiative to promote understanding of bees through image and trait digitization. Biodiversity Information Science and Standards 5: e74037. https://doi.org/10.3897/biss.5.74037 Poelen, JS & Seltmann, KS (2024) Bees Only Please: Bees Only Please: Selecting Hundreds of Thousands of Possible Bee Interactions Using a Laptop, Open Datasets, and Small (but Mighty) Commandline Tools. https://www.globalbioticinteractions.org/2024/06/07/bees-only-please Ascher, J. S. and J. Pickering (2020) Discover Life bee species guide and world checklist (Hymenoptera: Apoidea: Anthophila). http://www.discoverlife.org/mp/20q?guide=Apoidea_species. Acknowledgements This project is supported by the National Science Foundation. Award numbers: DBI:2102006, DBI:2101929, DBI:2101908, DBI:2101876, DBI:2101875, DBI:2101851, DBI:2101345, DBI:2101913, DBI:2101891 and DBI:2101850 
    more » « less
  2. Abstract Commonly used data citation practices rely on unverifiable retrieval methods which are susceptible to content drift, which occurs when the data associated with an identifier have been allowed to change. Based on our earlier work on reliable dataset identifiers, we propose signed citations, i.e., customary data citations extended to also include a standards-based, verifiable, unique, and fixed-length digital content signature. We show that content signatures enable independent verification of the cited content and can improve the persistence of the citation. Because content signatures are location- and storage-medium-agnostic, cited data can be copied to new locations to ensure their persistence across current and future storage media and data networks. As a result, content signatures can be leveraged to help scalably store, locate, access, and independently verify content across new and existing data infrastructures. Content signatures can also be embedded inside content to create robust, distributed knowledge graphs that can be cited using a single signed citation. We describe applications of signed citations to solve real-world data collection, identification, and citation challenges. 
    more » « less
  3. Commonly used data citation practices rely on unverifiable retrieval methods which are susceptible to “content drift”, which occurs when the data associated with an identifier have been allowed to change. Based on our earlier work on reliable dataset identifiers, we propose signed citations, i.e., customary data citations extended to also include a standards-based, verifiable, unique, and fixed-length digital content signature. We show that content signatures enable independent verification of the cited content and can improve the persistence of the citation. Because content signatures are location- and storage-medium-agnostic, cited data can be copied to new locations to ensure their persistence across current and future storage media and data networks. As a result, content signatures can be leveraged to help scalably store, locate, access, and independently verify content across new and existing data infrastructures. Content signatures can also be embedded inside content to create robust, distributed knowledge graphs that can be cited using a single signed citation. We describe real-world applications of signed citations used to cite and compile distributed data collections, cite specific versions of existing data networks, and stabilize associations between URLs and content.

     
    more » « less
  4. A biodiversity dataset graph: GBIF, iDigBio, BioCASe

    The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections.

    This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2020-05-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org".

    This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar ls --remote https://zenodo.org/record/3852671/files > /dev/null

    Optionally, you can retrieve all associated data (>500GB) files using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3852671/files,https://archive.org/download/biodiversity-dataset-archives/data.zip/data/,https://deeplinker.bio

    Please note https://archive.org/download/biodiversity-dataset-archives/data.zip/data/ and https://deeplinker.bio are Preston remotes that provided access to GBIF, iDigBio, BioCASe data files at time of writing (25 May 2020). These remotes can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints. See also https://archive.org/details/biodiversity-dataset-archives .

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history

    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <http://purl.org/pav/previousVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <http://purl.org/pav/previousVersion> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .
    <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <http://purl.org/pav/previousVersion> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .
    <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <http://purl.org/pav/previousVersion> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .
    <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <http://purl.org/pav/previousVersion> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .
    <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <http://purl.org/pav/previousVersion> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .
    <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <http://purl.org/pav/previousVersion> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .
    <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <http://purl.org/pav/previousVersion> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .
    <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <http://purl.org/pav/previousVersion> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .
    <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <http://purl.org/pav/previousVersion> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .
    <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <http://purl.org/pav/previousVersion> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .
    <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <http://purl.org/pav/previousVersion> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .
    <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <http://purl.org/pav/previousVersion> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .
    <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <http://purl.org/pav/previousVersion> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .
    <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <http://purl.org/pav/previousVersion> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .
    <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <http://purl.org/pav/previousVersion> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .
    <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <http://purl.org/pav/previousVersion> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .
    <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <http://purl.org/pav/previousVersion> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .
    <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <http://purl.org/pav/previousVersion> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .
    <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <http://purl.org/pav/previousVersion> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .
    <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <http://purl.org/pav/previousVersion> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .
    <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <http://purl.org/pav/previousVersion> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .
    <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <http://purl.org/pav/previousVersion> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .
    <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <http://purl.org/pav/previousVersion> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .
    <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <http://purl.org/pav/previousVersion> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .
    <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <http://purl.org/pav/previousVersion> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .
    <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <http://purl.org/pav/previousVersion> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .
    <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <http://purl.org/pav/previousVersion> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .
    <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <http://purl.org/pav/previousVersion> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .
    <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <http://purl.org/pav/previousVersion> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .
    <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <http://purl.org/pav/previousVersion> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .
    <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <http://purl.org/pav/previousVersion> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .
    <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <http://purl.org/pav/previousVersion> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .
    <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <http://purl.org/pav/previousVersion> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .
    <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <http://purl.org/pav/previousVersion> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .
    <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <http://purl.org/pav/previousVersion> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .
    <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <http://purl.org/pav/previousVersion> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .
    <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <http://purl.org/pav/previousVersion> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .
    <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> <http://purl.org/pav/previousVersion> <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> .
    <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> <http://purl.org/pav/previousVersion> <hash://sha256/6fb7271a2da1543036e39bcdb4c415a46b5437569eaaf0ffdef3e907a2f4309f> .
    <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> <http://purl.org/pav/previousVersion> <hash://sha256/ab62f4a9601f30d23353a479830f9d2dfc7898e15d2cc2d81977e898d885c908> .
    <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> <http://purl.org/pav/previousVersion> <hash://sha256/ff74959ec6e5e98e7db674afcb915f50725f049b968e9a9f10de169aa0a3dcb5> .
    <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> <http://purl.org/pav/previousVersion> <hash://sha256/6c4c94cdb224d39e7c655b1a1a6afbba8daf3c9ac64c42ba72dfd346d5d3a547> .
    <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> <http://purl.org/pav/previousVersion> <hash://sha256/9c17ce013b33c3c9e6bc513cb49a14660fad9bd6f87a4f21568cc871b10ba39b> .
    <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> <http://purl.org/pav/previousVersion> <hash://sha256/5dcf876c6cb0c5b15197acf1ea6989d41c1a1333c6a7e0437f035aa9d22a3790> .
    <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> <http://purl.org/pav/previousVersion> <hash://sha256/39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c> .
    <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> <http://purl.org/pav/previousVersion> <hash://sha256/916255b2b73680595dcb22b30991a757dd223208473fb4fbe90405757bc07953> .
    <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> <http://purl.org/pav/previousVersion> <hash://sha256/3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1> .
    <hash://sha256/8aacce08462b87a345d271081783bdd999663ef90099212c8831db399fc0831b> <http://purl.org/pav/previousVersion> <hash://sha256/f13b15a20e4fe70b4a111e67ac20ef676404b8456dfc39694f2cb3a4c62a2b2d> .


    If you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    OK    CONTENT_PRESENT_VALID_HASH    89931
    hash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    OK    CONTENT_PRESENT_VALID_HASH    202701

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston[2] v0.1.15. --
    preston.jar

    -- individual provenance index files --

    049b0eb995b484c1e64184f582f51b3c608dcade70c4aefc2d53f903bae45098
    073315c32d7fd19868449bef1b11b15a86981dee53a31f7f5c882f7e3be413c3
    1172c6927e58113db668409d36b6a2cd84cf1a93e85b50d65d0bd008a5d8aaa4
    1707cb11cd9f696f1a86fd06742c1e14fad856747be88791f79f6fc7c979d5a6
    272ff1f12a573c667634d934d06b8bab0dd9cc6558795287ea99fab87620d005
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2bbbe11bb1932c6c8fbbc2ed16dde182f53c4cecbe0dd4f779c32f527a61bc62
    37b8b636e939072d0df7246bf077ead4279f9dd33929be322e631104b0641308
    3901b6af522d535fb164823704686e72f73b7798a2a64eaeb817134552c69e2c
    395ed0c95a624f8853116442690965acf69151acd6b33cc4fc710f567828f784
    460c14ed0129c1469c9149ed1030cdc133f110fb32048748323982cb88dd7eda
    477b6c4e9ecf5c8cd1b5502e0245c8622fa4b358f6710f97db39b473ed3d8235
    52b7274f5d795e4987964bb1a327dd6d6e4f65870e6a7aac172481d0ba3013d4
    54786bde04751bc31bf38c9e89c010cfee7de91760e1f5f31218ff11acff8a70
    6135b237a49b37b857801836494f2c36bcb1526bdacf001a9d11727fff6bf1f1
    674937568c0572bc2873f502dca2fe691ba230869f0aba73f5938422654c05cc
    69b4d5ca9643c14501a48a2b1eb24971a6da68da5033c304f7f00b94e16a11d9
    6df3363a236d4f026154ef86b34d9672b111333d0c2be179c43db146864f6ed3
    70066ea7c6a9dd6c2193cdc90b3b1ff7664af235ab245f6c03d1dd497b376570
    7084702f8025c99a6608a3355ccad5ff5e644ad544121f5d524961f7fe29ceb6
    7e9934a1fc580c3f591c295306ab364c2e7a589e91590ab6334514e4b5c28062
    7ebb008412baaac3afcc8af68b796bf4ca98f367cfd61a815eee82cdffeab196
    886edb8d22973bb04fe3b42d12106029a00b9deab3fb77d8787123327b77ae3b
    8a2426eb4b38af30c6ee764463b8684e0dec400e4472a2a53e6eedf246dab178
    8a6d7e2ab026ff56380235fd9696f5e538e5e426b9374f2ddf3a705e186a7788
    8d44c9e36a505e5c3f125e1702ef7473280bf5bcfa624fe5d3998694b67e0887
    94290680edef0f8ac81d5d4d5b8b680ba5ce821df17c4de62464429552c3360e
    95f88f27ed3448534206406738dfb5c5030fe3d6883c6dda261649357600883f
    9d12cae409e8ea0a546f7945cc629d622400000c3338e4710d9c6084fca9274d
    9fa9ea50db419c75251026708183add8973d9e68a79062f7808b110bef21006e
    a24abbe089556f51fe9c2a51febdcaf893b419556312bcc63515713fc4a52922
    a3b0477fe46f09b0f51c0f651691665c149bc341f5c19996675d849252e86453
    a486474333f05884580dd10c54c95999063c7d1bc22e2cbe3bead604aca0a183
    a524b9af3f172793998e1f9c5c0e9c949cc935624a17ed3364d32bc0391c9382
    aa0e508aeb96f240b551fe92ff4224325ddcdf66f97eef95ac78aec62e53a169
    ab34300942ec02cca7adf2744f6fbc1ab7587060bea09ef92b65b66f89d1ddcd
    b05d4a17d9a02180669d7eb017102dd1a739fb4615759cba94baf944b2aee29c
    b37c79f95c22fc4d657cc89dedd7a870923285da690ad4f5121962492484a142
    bc699639e5515a5fc9da9d442357cc8a9ff310a177e54f1646e002723de49f1d
    be6d8cd5f1405a5e3e8aa492fb8dab41f6521608834d746e6cbc58d2f550f918
    c06f4413a97a5540fbdd40bdbfb194435c154533df7fe388dfdd378084e19c3d
    c585b8addfb7f7991ad74c0bae158aecefc6be5b11c28b020135e0f13040e187
    c66587e9730a6f68e961240038892df656ea99a1a25f4ff8ce556c07b09a4878
    ca289dce66c8b9955c223fe3e906b8f26c12cf53506cebe651b004961f7964af
    cea1aab236de5de8da8954797d846c225bf2ad4f8fe3cd413e60ab029f9e1b3e
    da05cc27a47e755ebe912fafae434df5bd31a5d92658fe1943acc0a2023fab32
    ee473aeda889fd12ac2c76aae06314e5f279cce5f1a736d39bfc097657a82060
    fcb2ee4d630a9a1440417b0c46da5bc1578a388d6aedd12189a23283b60dde7d
    fef548489bd7bea43ae1c2b7755d38a87f4a8b038a466bf7e7b4ac64d665fd62
    ff32a7cbc99eaf6b67695fd94284a9b1b47a76497ef4d10ffc4dae199cc0d7c3

    --- individual provenance logs --

    05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd
    09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2
    0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781
    102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9
    1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e
    20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17
    24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43
    261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12
    39f83f5805f32f765003c5e9ee8c69adb3889d9f26dd61bf4aa3a829ac744e2c
    3b39831bcc286c1db44787e21b736378f5847a16b7c39bdac3dd2011e9189dc1
    3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7
    3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee
    480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6
    58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70
    5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7

     
    more » « less
  5. A biodiversity dataset graph: DataONE

    The intended use of this archive is to facilitate (meta-)analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data.

    This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-11-06 and 2020-05-07 using "preston update -u https://dataone.org".

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3849494/files

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .
    <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .
    <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .
    <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .
    <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .
    <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .
    <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .
    <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .
    <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> <http://purl.org/pav/previousVersion> <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> .
    <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> <http://purl.org/pav/previousVersion> <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> .
    <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> <http://purl.org/pav/previousVersion> <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> .
    <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> <http://purl.org/pav/previousVersion> <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> .
    <hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> <http://purl.org/pav/previousVersion> <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> .
    <hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> <http://purl.org/pav/previousVersion> <hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> .
    <hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> <http://purl.org/pav/previousVersion> <hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> .
    <hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> <http://purl.org/pav/previousVersion> <hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> .
    <hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> <http://purl.org/pav/previousVersion> <hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> .
    <hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3> <http://purl.org/pav/previousVersion> <hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    OK    CONTENT_PRESENT_VALID_HASH    21580    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945
    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    OK    CONTENT_PRESENT_VALID_HASH    2035    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f
    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    OK    CONTENT_PRESENT_VALID_HASH    1935    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53
    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    OK    CONTENT_PRESENT_VALID_HASH    1553    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687


    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston[2] v0.1.15. --
    preston.jar

    -- preston archives containing DataONE data files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf
    2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b
    35eb1e17e2bf3e71212cde35bdb03e8a6545a57483ea3c1633929257b70cf637
    3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7
    66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba
    7466a35e42dea7e2be068060ec0c926f9a8686388ed504ef5c6c990c1ba4e8d0
    81161d9746c2a5823641c436e773fb4508516b055da85f4494b38c545349da39
    8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66
    a90eed8d70c54c8e554f2dfde4fceb434eda162d9615d62de96ded2344f88a78
    c33ef5e29100b323412f1f3bc66908c8e01e4f0d1db4ea3685d2fffc47981dd6
    c84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0
    d362d599d72000c4feb464db5a669b12e15fc3ca1a49b1e7d4d6f7d6d5d15411
    d9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d
    da26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db
    e4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55
    eb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94
    f13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27
    f493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f
    ff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3

    --- end of file descriptions ---


    References

    [1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-11-06 to 2020-05-07 with provenance hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .


    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.

     
    more » « less
  6. A biodiversity dataset graph: GBIF, iDigBio, BioCASe

    The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections. 

    This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2019-10-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org". 

    This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar ls --remote https://zenodo.org/record/3484205/files > /dev/null

    Optionally, you can retrieve all associated data (~500GB) files using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3484205/files,https://deeplinker.bio

    Please note https://deeplinker.bio is a Preston remote that provided access to GBIF, iDigBio, BioCASe data files at time of writing (13 Oct 2019). This remote can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints.

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <http://purl.org/pav/previousVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <http://purl.org/pav/previousVersion> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .
    <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <http://purl.org/pav/previousVersion> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .
    <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <http://purl.org/pav/previousVersion> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .
    <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <http://purl.org/pav/previousVersion> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .
    <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <http://purl.org/pav/previousVersion> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .
    <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <http://purl.org/pav/previousVersion> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .
    <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <http://purl.org/pav/previousVersion> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .
    <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <http://purl.org/pav/previousVersion> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .
    <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <http://purl.org/pav/previousVersion> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .
    <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <http://purl.org/pav/previousVersion> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .
    <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <http://purl.org/pav/previousVersion> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .
    <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <http://purl.org/pav/previousVersion> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .
    <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <http://purl.org/pav/previousVersion> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .
    <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <http://purl.org/pav/previousVersion> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .
    <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <http://purl.org/pav/previousVersion> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .
    <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <http://purl.org/pav/previousVersion> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .
    <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <http://purl.org/pav/previousVersion> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .
    <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <http://purl.org/pav/previousVersion> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .
    <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <http://purl.org/pav/previousVersion> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .
    <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <http://purl.org/pav/previousVersion> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .
    <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <http://purl.org/pav/previousVersion> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .
    <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <http://purl.org/pav/previousVersion> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .
    <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <http://purl.org/pav/previousVersion> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .
    <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <http://purl.org/pav/previousVersion> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .
    <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <http://purl.org/pav/previousVersion> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .
    <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <http://purl.org/pav/previousVersion> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .
    <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <http://purl.org/pav/previousVersion> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .
    <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <http://purl.org/pav/previousVersion> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .
    <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <http://purl.org/pav/previousVersion> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .
    <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <http://purl.org/pav/previousVersion> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .
    <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <http://purl.org/pav/previousVersion> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .
    <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <http://purl.org/pav/previousVersion> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .
    <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <http://purl.org/pav/previousVersion> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .
    <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <http://purl.org/pav/previousVersion> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .
    <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <http://purl.org/pav/previousVersion> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .
    <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <http://purl.org/pav/previousVersion> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .
    <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <http://purl.org/pav/previousVersion> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .

    If you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    OK    CONTENT_PRESENT_VALID_HASH    89931
    hash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    OK    CONTENT_PRESENT_VALID_HASH    202701

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". 

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README 

    -- executable java jar containing preston[2] v0.1.8. --
    preston.jar

    -- individual provenance index files --
    049b0eb995b484c1e64184f582f51b3c608dcade70c4aefc2d53f903bae45098
    073315c32d7fd19868449bef1b11b15a86981dee53a31f7f5c882f7e3be413c3
    1172c6927e58113db668409d36b6a2cd84cf1a93e85b50d65d0bd008a5d8aaa4
    1707cb11cd9f696f1a86fd06742c1e14fad856747be88791f79f6fc7c979d5a6
    272ff1f12a573c667634d934d06b8bab0dd9cc6558795287ea99fab87620d005
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    37b8b636e939072d0df7246bf077ead4279f9dd33929be322e631104b0641308
    3901b6af522d535fb164823704686e72f73b7798a2a64eaeb817134552c69e2c
    395ed0c95a624f8853116442690965acf69151acd6b33cc4fc710f567828f784
    460c14ed0129c1469c9149ed1030cdc133f110fb32048748323982cb88dd7eda
    477b6c4e9ecf5c8cd1b5502e0245c8622fa4b358f6710f97db39b473ed3d8235
    52b7274f5d795e4987964bb1a327dd6d6e4f65870e6a7aac172481d0ba3013d4
    54786bde04751bc31bf38c9e89c010cfee7de91760e1f5f31218ff11acff8a70
    6135b237a49b37b857801836494f2c36bcb1526bdacf001a9d11727fff6bf1f1
    69b4d5ca9643c14501a48a2b1eb24971a6da68da5033c304f7f00b94e16a11d9
    70066ea7c6a9dd6c2193cdc90b3b1ff7664af235ab245f6c03d1dd497b376570
    7084702f8025c99a6608a3355ccad5ff5e644ad544121f5d524961f7fe29ceb6
    7ebb008412baaac3afcc8af68b796bf4ca98f367cfd61a815eee82cdffeab196
    886edb8d22973bb04fe3b42d12106029a00b9deab3fb77d8787123327b77ae3b
    8a6d7e2ab026ff56380235fd9696f5e538e5e426b9374f2ddf3a705e186a7788
    95f88f27ed3448534206406738dfb5c5030fe3d6883c6dda261649357600883f
    9d12cae409e8ea0a546f7945cc629d622400000c3338e4710d9c6084fca9274d
    9fa9ea50db419c75251026708183add8973d9e68a79062f7808b110bef21006e
    a24abbe089556f51fe9c2a51febdcaf893b419556312bcc63515713fc4a52922
    a3b0477fe46f09b0f51c0f651691665c149bc341f5c19996675d849252e86453
    a486474333f05884580dd10c54c95999063c7d1bc22e2cbe3bead604aca0a183
    a524b9af3f172793998e1f9c5c0e9c949cc935624a17ed3364d32bc0391c9382
    aa0e508aeb96f240b551fe92ff4224325ddcdf66f97eef95ac78aec62e53a169
    ab34300942ec02cca7adf2744f6fbc1ab7587060bea09ef92b65b66f89d1ddcd
    b05d4a17d9a02180669d7eb017102dd1a739fb4615759cba94baf944b2aee29c
    b37c79f95c22fc4d657cc89dedd7a870923285da690ad4f5121962492484a142
    be6d8cd5f1405a5e3e8aa492fb8dab41f6521608834d746e6cbc58d2f550f918
    c06f4413a97a5540fbdd40bdbfb194435c154533df7fe388dfdd378084e19c3d
    c585b8addfb7f7991ad74c0bae158aecefc6be5b11c28b020135e0f13040e187
    c66587e9730a6f68e961240038892df656ea99a1a25f4ff8ce556c07b09a4878
    cea1aab236de5de8da8954797d846c225bf2ad4f8fe3cd413e60ab029f9e1b3e
    da05cc27a47e755ebe912fafae434df5bd31a5d92658fe1943acc0a2023fab32
    fcb2ee4d630a9a1440417b0c46da5bc1578a388d6aedd12189a23283b60dde7d
    ff32a7cbc99eaf6b67695fd94284a9b1b47a76497ef4d10ffc4dae199cc0d7c3

    --- individual provenance logs --
    05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd
    09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2
    0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781
    102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9
    1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e
    20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17
    24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43
    261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12
    3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7
    3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee
    480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6
    58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70
    5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7
    5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa
    6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7
    650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79
    668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103
    7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29
    7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93
    8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4
    9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e
    a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208
    a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6
    af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae
    af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63
    b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4
    b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d
    b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd
    ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe
    c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6
    c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55
    d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1
    d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3
    dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda
    e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df
    e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb
    eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6
    f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36
    fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc

    --- end of file descriptions ---

    References 

    [1] Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe, https://gbif.org,https://idigbio.org,http://biocase.org) accessed from 2018-09-03 to 2019-10-02 with provenance hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . 

    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation

     
    more » « less
  7. A biodiversity dataset graph: DataONE

    The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data. 

    This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-10-18 and 2019-10-03 using "preston update -u https://dataone.org". 

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543).  

    To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3483218/files

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .
    <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .
    <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .
    <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .
    <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .
    <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .
    <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .
    <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .
    <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> <http://purl.org/pav/previousVersion> <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> .
    <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> <http://purl.org/pav/previousVersion> <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> .
    <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> <http://purl.org/pav/previousVersion> <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> .
    <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> <http://purl.org/pav/previousVersion> <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    OK    CONTENT_PRESENT_VALID_HASH    21580
    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    OK    CONTENT_PRESENT_VALID_HASH    2035
    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    OK    CONTENT_PRESENT_VALID_HASH    1935
    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    OK    CONTENT_PRESENT_VALID_HASH    1553

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". 

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README 

    -- executable java jar containing preston[2] v0.1.8. --
    preston.jar

    -- preston archives containing DataONE data files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz 

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf
    2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b
    3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7
    66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba
    8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66
    c84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0
    d9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d
    da26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db
    e4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55
    eb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94
    f13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27
    f493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f
    ff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3

    --- end of file descriptions ---


    References 

    [1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-10-18 to 2019-10-03 with provenance hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650 .
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . 


    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation
     

     
    more » « less
  8. A biodiversity dataset graph: DataONE

    The intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE, [1]). DataONE is a distributed infrastructure that provides information about earth observation data. 

    This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 17 October 2018 and 7 July 2019.  

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to eestablish a versioning mechanism. Provenance files describe how, when and where the DataONE meta-data files were retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543).  

    To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3277312/files

    After that, verify the index of the archive by reproducing the following result:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .
    <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .
    <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .
    <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .
    <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .
    <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .
    <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .
    <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    OK    CONTENT_PRESENT_VALID_HASH    21580
    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    OK    CONTENT_PRESENT_VALID_HASH    2035
    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    OK    CONTENT_PRESENT_VALID_HASH    1935
    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    OK    CONTENT_PRESENT_VALID_HASH    1553

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". 

    Files in this data publication:

    README - this file

    preston.jar - executable java jar containing preston[2] v0.1.1.

    preston-[00-ff].tar.gz - preston archives containing DataONE meta-data files, their provenance and a provenance index.

    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a - preston index file
    2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf - preston index file
    3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7 - preston index file
    66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba - preston index file
    8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66 - preston index file
    d9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d - preston index file
    da26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db - preston index file
    e4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55 - preston index file
    eb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94 - preston index file
    ff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3 - preston index file

    [1] DataONE, https://www.dataone.org
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . DataONE was crawled via Preston with "preston update -u https://dataone.org".

    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation

     
    more » « less