Flaky tests are tests that can non-deterministically pass or fail for the same code version. These tests undermine regression testing efficiency, because developers cannot easily identify whether a test fails due to their recent changes or due to flakiness. Ideally, one would detect flaky tests right when flakiness is introduced, so that developers can then immediately remove the flakiness. Some software organizations, e.g., Mozilla and Netflix, run some tools—detectors—to detect flaky tests as soon as possible. However, detecting flaky tests is costly due to their inherent non-determinism, so even state-of-the-art detectors are often impractical to be used on all tests for each project change. To combat the high cost of applying detectors, these organizations typically run a detector solely on newly added or directly modified tests, i.e., not on unmodified tests or when other changes occur (including changes to the test suite, the code under test, and library dependencies). However, it is unclear how many flaky tests can be detected or missed by applying detectors in only these limited circumstances. To better understand this problem, we conduct a large-scale longitudinal study of flaky tests to determine when flaky tests become flaky and what changes cause them to become flaky. We apply two state-of-theart detectors to 55 Java projects, identifying a total of 245 flaky tests that can be compiled and run in the code version where each test was added. We find that 75% of flaky tests (184 out of 245) are flaky when added, indicating substantial potential value for developers to run detectors specifically on newly added tests. However, running detectors solely on newly added tests would still miss detecting 25% of flaky tests. The percentage of flaky tests that can be detected does increase to 85% when detectors are run on newly added or directly modified tests. The remaining 15% of flaky tests become flaky due to other changes and can be detected only when detectors are always applied to all tests. Our study is the first to empirically evaluate when tests become flaky and to recommend guidelines for applying detectors in the future.
more »
« less
Flaky Test Dataset to Accompany "FlakeFlagger: Predicting Flakiness Without Rerunning Tests"
When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled at least as many tests as flaky as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives (an increase in precision from just 11% to 60%). This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. By investigating the information gain of each feature, we conclude that test execution time, overall test coverage, coverage of recently changed lines and usage of third party libraries are effective predictors of test flakiness. We did not find any keywords or tokens in the source code of tests that were effective in predicting test flakiness, and did not find the presence of test smells to be effective in predicting test flakiness. This archive contains the dataset that we collected of flaky tests, along with the features that we collected from each test. Contents: Project_Info.csv: List of projects and their revisions studied build-logs-<project-slug>.tgz: An archive of all of the maven build logs from each of the 10,000 runs of that project's test suite. failing-test-reports-<project-slug>.tgz An archive of all of the surefire XML reports for each failing test of each build of each project. test_results.csv: Summary of the number of passing and failing runs for each test in each project. "Run ID" is a key into the <project-slug>.tgz archive also in this artifact, which refers to the run that we observed the test fail on. test_features.csv: Summary of the features that each test had, as per our feature detectors described in the paper flakeflagger-code.zip: All scripts used to generate and process these results. These scripts are also located at https://github.com/AlshammariA/FlakeFlagger
more »
« less
- Award ID(s):
- 2100037
- PAR ID:
- 10471488
- Publisher / Repository:
- Zenodo
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Flaky tests are a source of frustration and uncertainty for developers. In an educational environment, flaky tests can create doubts related to software behavior and student grades, especially when the grades depend on tests passing. NC State University's junior-level software engineering course models industrial practice through team-based development and testing of new features on a large electronic health record (EHR) system, iTrust2. Students are expected to maintain and supplement an extensive suite of UI tests using Selenium WebDriver. Team builds are run on the course's continuous integration (CI) infrastructure. Students report, and we confirm, that tests that pass on one build will inexplicably fail on the next, impacting productivity and confidence in code quality and the CI system. The goal of this work is to find and fix the sources of flaky tests in iTrust2. We analyze configurations of Selenium using different underlying web browsers and timeout strategies (waits) for both test stability and runtime performance. We also consider underlying hardware and operating systems. Our results show that HtmlUnit with Thread waits provides the lowest number of test failures and best runtime on poor-performing hardware. When given more resources (e.g., more memory and a faster CPU), Google Chrome with Angular waits is less flaky and faster than HtmlUnit, especially if the browser instance is not restarted between tests. The outcomes of this research are a more stable and substantially faster teaching application and a recommendation on how to configure Selenium for applications similar to iTrust2 that run in a CI environment.more » « less
-
{"Abstract":["A biodiversity dataset graph: BHL<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.<\/p>\n\nThis dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-19 and 2020-05-09 using "preston update -u https://biodiversitylibrary.org".<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nTo retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3849560/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> <http://purl.org/pav/previousVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> <http://purl.org/pav/previousVersion> <hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> .\n<hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> <http://purl.org/pav/previousVersion> <hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> .\n<hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> <http://purl.org/pav/previousVersion> <hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> .\n<hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> <http://purl.org/pav/previousVersion> <hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> .\n<hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> <http://purl.org/pav/previousVersion> <hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> .\n<hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> <http://purl.org/pav/previousVersion> <hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> .\n<hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> <http://purl.org/pav/previousVersion> <hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> .\n<hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> <http://purl.org/pav/previousVersion> <hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> .\n<hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> <http://purl.org/pav/previousVersion> <hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> .\n<hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1> <http://purl.org/pav/previousVersion> <hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca file:/home/preston/preston-bhl/data/e0/c1/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca OK CONTENT_PRESENT_VALID_HASH 49458087 hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca\nhash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 file:/home/preston/preston-bhl/data/1a/57/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99 OK CONTENT_PRESENT_VALID_HASH 25745 hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99\nhash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c file:/home/preston/preston-bhl/data/85/ef/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c OK CONTENT_PRESENT_VALID_HASH 519892 hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c\nhash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 file:/home/preston/preston-bhl/data/25/1e/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743 OK CONTENT_PRESENT_VALID_HASH 787414 hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston[2] v0.1.15. --\npreston.jar<\/p>\n\n-- preston archives containing BHL data files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2b1104cb7749e818c9afca78391b2d0099bbb0a32f2b348860a335cd2f8f6800\n4081bc59dff58d63f6a86c623cb770f01e9a355a42495b205bcb538cd526190f\n47a2816f8b5600b24487093adcddfea12434cc4f270f3ab09d9215fbdd546cd2\n6f99a1388823fca745c9e22ac21e2da909a219aa1ace55170fa9248c0276903c\n7ae46d7cd9b5a0f5889ba38bac53c82e591b0bdf8b605f5e48c0dce8fb7b717f\n82903464889fea7c53f53daedf4e41fa31092f82619edeb3415eb2b473f74af3\n9e8c86243df39dd4fe82a3f814710eccf73aa9291d050415408e346fa2b09e70\na8308fbf4530e287927c471d881ce0fc852f16543d46e1ee26f1caba48815f3a\nbcec6df2ea7f74e9a6e2830d0072e6b2fbe65323d9ddb022dd6e1349c23996e2\ncfe47c25ec0210ac73c06b407beb20d9c58355cb15bae427fdc7541870ca2e4e\nf73fc9e70bce8f21f0c96b8ef0903749d8f223f71343ab5a8910968f99c9b8b6<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Biodiversity Heritage Library (BHL, https://biodiversitylibrary.org) accessed from 2019-05-19 to 2020-05-09 with provenance hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .<\/p>\n\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]}more » « less
-
{"Abstract":["A biodiversity dataset graph: GBIF, iDigBio, BioCASe<\/p>\n\nThe intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections. <\/p>\n\nThis dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2019-10-02 using "preston update -u https://gbif.org,https://idigbio.org,http://biocase.org". <\/p>\n\nThis publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nTo retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar ls --remote https://zenodo.org/record/3484205/files > /dev/null<\/p>\n\nOptionally, you can retrieve all associated data (~500GB) files using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3484205/files,https://deeplinker.bio<\/p>\n\nPlease note https://deeplinker.bio is a Preston remote that provided access to GBIF, iDigBio, BioCASe data files at time of writing (13 Oct 2019). This remote can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints.<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .\n<hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <http://purl.org/pav/previousVersion> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .\n<hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <http://purl.org/pav/previousVersion> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .\n<hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <http://purl.org/pav/previousVersion> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .\n<hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <http://purl.org/pav/previousVersion> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .\n<hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <http://purl.org/pav/previousVersion> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .\n<hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <http://purl.org/pav/previousVersion> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .\n<hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <http://purl.org/pav/previousVersion> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .\n<hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <http://purl.org/pav/previousVersion> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .\n<hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <http://purl.org/pav/previousVersion> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .\n<hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <http://purl.org/pav/previousVersion> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .\n<hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <http://purl.org/pav/previousVersion> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .\n<hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <http://purl.org/pav/previousVersion> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .\n<hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <http://purl.org/pav/previousVersion> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .\n<hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <http://purl.org/pav/previousVersion> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .\n<hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <http://purl.org/pav/previousVersion> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .\n<hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <http://purl.org/pav/previousVersion> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .\n<hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <http://purl.org/pav/previousVersion> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .\n<hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <http://purl.org/pav/previousVersion> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .\n<hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <http://purl.org/pav/previousVersion> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .\n<hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <http://purl.org/pav/previousVersion> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .\n<hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <http://purl.org/pav/previousVersion> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .\n<hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <http://purl.org/pav/previousVersion> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .\n<hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <http://purl.org/pav/previousVersion> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .\n<hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <http://purl.org/pav/previousVersion> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .\n<hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <http://purl.org/pav/previousVersion> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .\n<hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <http://purl.org/pav/previousVersion> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .\n<hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <http://purl.org/pav/previousVersion> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .\n<hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <http://purl.org/pav/previousVersion> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .\n<hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <http://purl.org/pav/previousVersion> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .\n<hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <http://purl.org/pav/previousVersion> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .\n<hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <http://purl.org/pav/previousVersion> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .\n<hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <http://purl.org/pav/previousVersion> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .\n<hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <http://purl.org/pav/previousVersion> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .\n<hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <http://purl.org/pav/previousVersion> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .\n<hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <http://purl.org/pav/previousVersion> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .\n<hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <http://purl.org/pav/previousVersion> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .\n<hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <http://purl.org/pav/previousVersion> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .\n<hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <http://purl.org/pav/previousVersion> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .<\/p>\n\nIf you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$$ java -jar preston.jar verify\nhash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362 file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362 OK CONTENT_PRESENT_VALID_HASH 89931\nhash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b OK CONTENT_PRESENT_VALID_HASH 210344\nhash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02 file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02 OK CONTENT_PRESENT_VALID_HASH 210344\nhash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38 file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38 OK CONTENT_PRESENT_VALID_HASH 202701<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". <\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME <\/p>\n\n-- executable java jar containing preston[2] v0.1.8. --\npreston.jar<\/p>\n\n-- individual provenance index files --\n049b0eb995b484c1e64184f582f51b3c608dcade70c4aefc2d53f903bae45098\n073315c32d7fd19868449bef1b11b15a86981dee53a31f7f5c882f7e3be413c3\n1172c6927e58113db668409d36b6a2cd84cf1a93e85b50d65d0bd008a5d8aaa4\n1707cb11cd9f696f1a86fd06742c1e14fad856747be88791f79f6fc7c979d5a6\n272ff1f12a573c667634d934d06b8bab0dd9cc6558795287ea99fab87620d005\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n37b8b636e939072d0df7246bf077ead4279f9dd33929be322e631104b0641308\n3901b6af522d535fb164823704686e72f73b7798a2a64eaeb817134552c69e2c\n395ed0c95a624f8853116442690965acf69151acd6b33cc4fc710f567828f784\n460c14ed0129c1469c9149ed1030cdc133f110fb32048748323982cb88dd7eda\n477b6c4e9ecf5c8cd1b5502e0245c8622fa4b358f6710f97db39b473ed3d8235\n52b7274f5d795e4987964bb1a327dd6d6e4f65870e6a7aac172481d0ba3013d4\n54786bde04751bc31bf38c9e89c010cfee7de91760e1f5f31218ff11acff8a70\n6135b237a49b37b857801836494f2c36bcb1526bdacf001a9d11727fff6bf1f1\n69b4d5ca9643c14501a48a2b1eb24971a6da68da5033c304f7f00b94e16a11d9\n70066ea7c6a9dd6c2193cdc90b3b1ff7664af235ab245f6c03d1dd497b376570\n7084702f8025c99a6608a3355ccad5ff5e644ad544121f5d524961f7fe29ceb6\n7ebb008412baaac3afcc8af68b796bf4ca98f367cfd61a815eee82cdffeab196\n886edb8d22973bb04fe3b42d12106029a00b9deab3fb77d8787123327b77ae3b\n8a6d7e2ab026ff56380235fd9696f5e538e5e426b9374f2ddf3a705e186a7788\n95f88f27ed3448534206406738dfb5c5030fe3d6883c6dda261649357600883f\n9d12cae409e8ea0a546f7945cc629d622400000c3338e4710d9c6084fca9274d\n9fa9ea50db419c75251026708183add8973d9e68a79062f7808b110bef21006e\na24abbe089556f51fe9c2a51febdcaf893b419556312bcc63515713fc4a52922\na3b0477fe46f09b0f51c0f651691665c149bc341f5c19996675d849252e86453\na486474333f05884580dd10c54c95999063c7d1bc22e2cbe3bead604aca0a183\na524b9af3f172793998e1f9c5c0e9c949cc935624a17ed3364d32bc0391c9382\naa0e508aeb96f240b551fe92ff4224325ddcdf66f97eef95ac78aec62e53a169\nab34300942ec02cca7adf2744f6fbc1ab7587060bea09ef92b65b66f89d1ddcd\nb05d4a17d9a02180669d7eb017102dd1a739fb4615759cba94baf944b2aee29c\nb37c79f95c22fc4d657cc89dedd7a870923285da690ad4f5121962492484a142\nbe6d8cd5f1405a5e3e8aa492fb8dab41f6521608834d746e6cbc58d2f550f918\nc06f4413a97a5540fbdd40bdbfb194435c154533df7fe388dfdd378084e19c3d\nc585b8addfb7f7991ad74c0bae158aecefc6be5b11c28b020135e0f13040e187\nc66587e9730a6f68e961240038892df656ea99a1a25f4ff8ce556c07b09a4878\ncea1aab236de5de8da8954797d846c225bf2ad4f8fe3cd413e60ab029f9e1b3e\nda05cc27a47e755ebe912fafae434df5bd31a5d92658fe1943acc0a2023fab32\nfcb2ee4d630a9a1440417b0c46da5bc1578a388d6aedd12189a23283b60dde7d\nff32a7cbc99eaf6b67695fd94284a9b1b47a76497ef4d10ffc4dae199cc0d7c3<\/p>\n\n--- individual provenance logs --\n05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd\n09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2\n0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781\n102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9\n1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e\n20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17\n24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43\n261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12\n3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7\n3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee\n480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6\n58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70\n5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7\n5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa\n6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7\n650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79\n668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103\n7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29\n7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93\n8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4\n9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e\na0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208\na7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6\naf0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae\naf8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63\nb5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4\nb83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d\nb9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd\nba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe\nc1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6\nc253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55\nd79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1\nd99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3\ndc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda\ne4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df\ne69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb\neb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6\nf3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36\nfd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc<\/p>\n\n--- end of file descriptions ---<\/p>\n\nReferences <\/p>\n\n[1] Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe, https://gbif.org,https://idigbio.org,http://biocase.org) accessed from 2018-09-03 to 2019-10-02 with provenance hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation<\/p>"]}more » « less
-
{"Abstract":["A biodiversity dataset graph: DataONE<\/p>\n\nThe intended use of this archive is to facilitate meta-analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data. <\/p>\n\nThis dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-10-18 and 2019-10-03 using "preston update -u https://dataone.org". <\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543). <\/p>\n\nTo retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3483218/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .\n<hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .\n<hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .\n<hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .\n<hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .\n<hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .\n<hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .\n<hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .\n<hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .\n<hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .\n<hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> <http://purl.org/pav/previousVersion> <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> .\n<hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> <http://purl.org/pav/previousVersion> <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> .\n<hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> <http://purl.org/pav/previousVersion> <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> .\n<hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> <http://purl.org/pav/previousVersion> <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945 OK CONTENT_PRESENT_VALID_HASH 21580\nhash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f OK CONTENT_PRESENT_VALID_HASH 2035\nhash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53 OK CONTENT_PRESENT_VALID_HASH 1935\nhash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687 OK CONTENT_PRESENT_VALID_HASH 1553<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". <\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME <\/p>\n\n-- executable java jar containing preston[2] v0.1.8. --\npreston.jar<\/p>\n\n-- preston archives containing DataONE data files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz <\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf\n2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b\n3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7\n66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba\n8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66\nc84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0\nd9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d\nda26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db\ne4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55\neb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94\nf13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27\nf493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f\nff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences <\/p>\n\n[1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-10-18 to 2019-10-03 with provenance hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650 .\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 . <\/p>\n\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation\n <\/p>"]}more » « less
An official website of the United States government
