skip to main content


Title: Mitigating RF Jamming Attacks at the Physical Layer with Machine Learning Dataset

Data files were used in support of the research paper titled “Mitigating RF Jamming Attacks at the Physical Layer with Machine Learning" which has been submitted to the IET Communications journal.

---------------------------------------------------------------------------------------------

All data was collected using the SDR implementation shown here: https://github.com/mainland/dragonradio/tree/iet-paper. Particularly for antenna state selection, the files developed for this paper are located in 'dragonradio/scripts/:'

  • 'ModeSelect.py': class used to defined the antenna state selection algorithm
  • 'standalone-radio.py': SDR implementation for normal radio operation with reconfigurable antenna
  • 'standalone-radio-tuning.py': SDR implementation for hyperparameter tunning
  • 'standalone-radio-onmi.py': SDR implementation for omnidirectional mode only

---------------------------------------------------------------------------------------------

Authors: Marko Jacovic, Xaime Rivas Rey, Geoffrey Mainland, Kapil R. Dandekar
Contact: krd26@drexel.edu

---------------------------------------------------------------------------------------------

Top-level directories and content will be described below. Detailed descriptions of experiments performed are provided in the paper.

---------------------------------------------------------------------------------------------

classifier_training: files used for training classifiers that are integrated into SDR platform

  • 'logs-8-18' directory contains OTA SDR collected log files for each jammer type and under normal operation (including congested and weaklink states)
  • 'classTrain.py' is the main parser for training the classifiers
  • 'trainedClassifiers' contains the output classifiers generated by 'classTrain.py'

post_processing_classifier: contains logs of online classifier outputs and processing script

  • 'class' directory contains .csv logs of each RTE and OTA experiment for each jamming and operation scenario
  • 'classProcess.py' parses the log files and provides classification report and confusion matrix for each multi-class and binary classifiers for each observed scenario - found in 'results->classifier_performance'

post_processing_mgen: contains MGEN receiver logs and parser

  • 'configs' contains JSON files to be used with parser for each experiment
  • 'mgenLogs' contains MGEN receiver logs for each OTA and RTE experiment described. Within each experiment logs are separated by 'mit' for mitigation used, 'nj' for no jammer, and 'noMit' for no mitigation technique used. File names take the form *_cj_* for constant jammer, *_pj_* for periodic jammer, *_rj_* for reactive jammer, and *_nj_* for no jammer. Performance figures are found in 'results->mitigation_performance'

ray_tracing_emulation: contains files related to Drexel area, Art Museum, and UAV Drexel area validation RTE studies.

  • Directory contains detailed 'readme.txt' for understanding.
  • Please note: the processing files and data logs present in 'validation' folder were developed by Wolfe et al. and should be cited as such, unless explicitly stated differently. 
    • S. Wolfe, S. Begashaw, Y. Liu and K. R. Dandekar, "Adaptive Link Optimization for 802.11 UAV Uplink Using a Reconfigurable Antenna," MILCOM 2018 - 2018 IEEE Military Communications Conference (MILCOM), 2018, pp. 1-6, doi: 10.1109/MILCOM.2018.8599696.

results: contains results obtained from study

  • 'classifier_performance' contains .txt files summarizing binary and multi-class performance of online SDR system. Files obtained using 'post_processing_classifier.'
  • 'mitigation_performance' contains figures generated by 'post_processing_mgen.'
  • 'validation' contains RTE and OTA performance comparison obtained by 'ray_tracing_emulation->validation->matlab->outdoor_hover_plots.m'

tuning_parameter_study: contains the OTA log files for antenna state selection hyperparameter study

  • 'dataCollect' contains a folder for each jammer considered in the study, and inside each folder there is a CSV file corresponding to a different configuration of the learning parameters of the reconfigurable antenna. The configuration selected was the one that performed the best across all these experiments and is described in the paper.
  • 'data_summary.txt'this file contains the summaries from all the CSV files for convenience.
 
more » « less
Award ID(s):
1730140
NSF-PAR ID:
10355685
Author(s) / Creator(s):
;
Publisher / Repository:
Zenodo
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data files were used in support of the research paper titled "“Experimentation Framework for Wireless
    Communication Systems under Jamming Scenarios" which has been submitted to the IET Cyber-Physical Systems: Theory & Applications journal. 

    Authors: Marko Jacovic, Michael J. Liston, Vasil Pano, Geoffrey Mainland, Kapil R. Dandekar
    Contact: krd26@drexel.edu

    ---------------------------------------------------------------------------------------------

    Top-level directories correspond to the case studies discussed in the paper. Each includes the sub-directories: logs, parsers, rayTracingEmulation, results. 

    --------------------------------

    logs:    - data logs collected from devices under test
        - 'defenseInfrastucture' contains console output from a WARP 802.11 reference design network. Filename structure follows '*x*dB_*y*.txt' in which *x* is the reactive jamming power level and *y* is the jaming duration in samples (100k samples = 1 ms). 'noJammer.txt' does not include the jammer and is a base-line case. 'outMedian.txt' contains the median statistics for log files collected prior to the inclusion of the calculation in the processing script. 
        - 'uavCommunication' contains MGEN logs at each receiver for cases using omni-directional and RALA antennas with a 10 dB constant jammer and without the jammer. Omni-directional folder contains multiple repeated experiments to provide reliable results during each calculation window. RALA directories use s*N* folders in which *N* represents each antenna state. 
        - 'vehicularTechnologies' contains MGEN logs at the car receiver for different scenarios. 'rxNj_5rep.drc' does not consider jammers present, 'rx33J_5rep.drc' introduces the periodic jammer, in 'rx33jSched_5rep.drc' the device under test uses time scheduling around the periodic jammer, in 'rx33JSchedRandom_5rep.drc' the same modified time schedule is used with a random jammer. 

    --------------------------------

    parsers:    - scripts used to collect or process the log files used in the study
            - 'defenseInfrastructure' contains the 'xputFiveNodes.py' script which is used to control and log the throughput of a 5-node WARP 802.11 reference design network. Log files are manually inspected to generate results (end of log file provides a summary). 
            - 'uavCommunication' contains a 'readMe.txt' file which describes the parsing of the MGEN logs using TRPR. TRPR must be installed to run the scripts and directory locations must be updated. 
            - 'vehicularTechnologies' contains the 'mgenParser.py' script and supporting 'bfb.json' configuration file which also require TRPR to be installed and directories to be updated. 

    --------------------------------

    rayTracingEmulation:    - 'wirelessInsiteImages': images of model used in Wireless Insite
                - 'channelSummary.pdf': summary of channel statistics from ray-tracing study
                - 'rawScenario': scenario files resulting from code base directly from ray-tracing output based on configuration defined by '*WI.json' file 
                - 'processedScenario': pre-processed scenario file to be used by DYSE channel emulator based on configuration defined by '*DYSE.json' file, applies fixed attenuation measured externally by spectrum analyzer and additional transmit power per node if desired
                - DYSE scenario file format: time stamp (milli seconds), receiver ID, transmitter ID, main path gain (dB), main path phase (radians), main path delay (micro seconds), Doppler shift (Hz), multipath 1 gain (dB), multipath 1 phase (radians), multipath 1 delay relative to main path delay (micro seconds), multipath 2 gain (dB), multipath 2 phase (radians), multipath 2 delay relative to main path delay (micro seconds)
                - 'nodeMapping.txt': mapping of Wireless Insite transceivers to DYSE channel emulator physical connections required
                - 'uavCommunication' directory additionally includes 'antennaPattern' which contains the RALA pattern data for the omni-directional mode ('omni.csv') and directional state ('90.csv')

    --------------------------------

    results:    - contains performance results used in paper based on parsing of aforementioned log files
     

     
    more » « less
  2. Binder is a publicly accessible online service for executing interactive notebooks based on Git repositories. Binder dynamically builds and deploys containers following a recipe stored in the repository, then gives the user a browser-based notebook interface. The Binder group periodically releases a log of container launches from the public Binder service. Archives of launch records are available here. These records do not include identifiable information like IP addresses, but do give the source repo being launched along with some other metadata. The main content of this dataset is in the binder.sqlite file. This SQLite database includes launch records from 2018-11-03 to 2021-06-06 in the events table, which has the following schema.

    CREATE TABLE events( version INTEGER, timestamp TEXT, provider TEXT, spec TEXT, origin TEXT, ref TEXT, guessed_ref TEXT ); CREATE INDEX idx_timestamp ON events(timestamp);
    • version indicates the version of the record as assigned by Binder. The origin field became available with version 3, and the ref field with version 4. Older records where this information was not recorded will have the corresponding fields set to null.
    • timestamp is the ISO timestamp of the launch
    • provider gives the type of source repo being launched ("GitHub" is by far the most common). The rest of the explanations assume GitHub, other providers may differ.
    • spec gives the particular branch/release/commit being built. It consists of <github-id>/<repo>/<branch>.
    • origin indicates which backend was used. Each has its own storage, compute, etc. so this info might be important for evaluating caching and performance. Note that only recent records include this field. May be null.
    • ref specifies the git commit that was actually used, rather than the named branch referenced by spec. Note that this was not recorded from the beginning, so only the more recent entries include it. May be null.
    • For records where ref is not available, we attempted to clone the named reference given by spec rather than the specific commit (see below). The guessed_ref field records the commit found at the time of cloning. If the branch was updated since the container was launched, this will not be the exact version that was used, and instead will refer to whatever was available at the time (early 2021). Depending on the application, this might still be useful information. Selecting only records with version 4 (or non-null ref) will exclude these guessed commits. May be null.

    The Binder launch dataset identifies the source repos that were used, but doesn't give any indication of their contents. We crawled GitHub to get the actual specification files in the repos which were fed into repo2docker when preparing the notebook environments, as well as filesystem metadata of the repos. Some repos were deleted/made private at some point, and were thus skipped. This is indicated by the absence of any row for the given commit (or absence of both ref and guessed_ref in the events table). The schema is as follows.

    CREATE TABLE spec_files ( ref TEXT NOT NULL PRIMARY KEY, ls TEXT, runtime BLOB, apt BLOB, conda BLOB, pip BLOB, pipfile BLOB, julia BLOB, r BLOB, nix BLOB, docker BLOB, setup BLOB, postbuild BLOB, start BLOB );

    Here ref corresponds to ref and/or guessed_ref from the events table. For each repo, we collected spec files into the following fields (see the repo2docker docs for details on what these are). The records in the database are simply the verbatim file contents, with no parsing or further processing performed.

    • runtime: runtime.txt
    • apt: apt.txt
    • conda: environment.yml
    • pip: requirements.txt
    • pipfile: Pipfile.lock or Pipfile
    • julia: Project.toml or REQUIRE
    • r: install.R
    • nix: default.nix
    • docker: Dockerfile
    • setup: setup.py
    • postbuild: postBuild
    • start: start

    The ls field gives a metadata listing of the repo contents (excluding the .git directory). This field is JSON encoded with the following structure based on JSON types:

    • Object: filesystem directory. Keys are file names within it. Values are the contents, which can be regular files, symlinks, or subdirectories.
    • String: symlink. The string value gives the link target.
    • Number: regular file. The number value gives the file size in bytes.
    CREATE TABLE clean_specs ( ref TEXT NOT NULL PRIMARY KEY, conda_channels TEXT, conda_packages TEXT, pip_packages TEXT, apt_packages TEXT );

    The clean_specs table provides parsed and validated specifications for some of the specification files (currently Pip, Conda, and APT packages). Each column gives either a JSON encoded list of package requirements, or null. APT packages have been validated using a regex adapted from the repo2docker source. Pip packages have been parsed and normalized using the Requirement class from the pkg_resources package of setuptools. Conda packages have been parsed and normalized using the conda.models.match_spec.MatchSpec class included with the library form of Conda (distinct from the command line tool). Users might want to use these parsers when working with the package data, as the specifications can become fairly complex.

    The missing table gives the repos that were not accessible, and event_logs records which log files have already been added. These tables are used for updating the dataset and should not be of interest to users.

     
    more » « less
  3. MCMC chains for the GWB analyses performed in the paper "The NANOGrav 15 yr Data Set: Search for Signals from New Physics". 

    The data is provided in pickle format. Each file contains a NumPy array with the MCMC chain (with burn-in already removed), and a dictionary with the model parameters' names as keys and their priors as values. You can load them as

    with open ('path/to/file.pkl', 'rb') as pick: temp = pickle.load(pick) params = temp[0] chain = temp[1]

    The naming convention for the files is the following:

    • igw: inflationary Gravitational Waves (GWs)
    • sigw: scalar-induced GWs
      • sigw_box: assumes a box-like feature in the primordial power spectrum.
      • sigw_delta: assumes a delta-like feature in the primordial power spectrum.
      • sigw_gauss: assumes a Gaussian peak feature in the primordial power spectrum.
    • pt: cosmological phase transitions
      • pt_bubble: assumes that the dominant contribution to the GW productions comes from bubble collisions.
      • pt_sound: assumes that the dominant contribution to the GW productions comes from sound waves.
    • stable: stable cosmic strings
      • stable-c: stable strings emitting GWs only in the form of GW bursts from cusps on closed loops.
      • stable-k: stable strings emitting GWs only in the form of GW bursts from kinks on closed loops.
      • stable-m: stable strings emitting monochromatic GW at the fundamental frequency.
      • stable-n: stable strings described by numerical simulations including GWs from cusps and kinks.
    • meta: metastable cosmic strings
      • meta-l: metastable strings with GW emission from loops only.
      • meta-ls metastable strings with GW emission from loops and segments.
    • super: cosmic superstrings.
    • dw: domain walls
      • dw-sm: domain walls decaying into Standard Model particles.
      • dw-dr: domain walls decaying into dark radiation.

    For each model, we provide four files. One for the run where the new-physics signal is assumed to be the only GWB source. One for the run where the new-physics signal is superimposed to the signal from Supermassive Black Hole Binaries (SMBHB), for these files "_bhb" will be appended to the model name. Then, for both these scenarios, in the "compare" folder we provide the files for the hypermodel runs that were used to derive the Bayes' factors.

    In addition to chains for the stochastic models, we also provide data for the two deterministic models considered in the paper (ULDM and DM substructures). For the ULDM model, the naming convention of the files is the following (all the ULDM signals are superimposed to the SMBHB signal, see the discussion in the paper for more details)

    • uldm_e: ULDM Earth signal.
    • uldm_p: ULDM pulsar signal
      • uldm_p_cor: correlated limit
      • uldm_p_unc: uncorrelated limit
    • uldm_c: ULDM combined Earth + pulsar signal direct coupling 
      • uldm_c_cor: correlated limit
      • uldm_c_unc: uncorrelated limit
    • uldm_vecB: vector ULDM coupled to the baryon number
      • uldm_vecB_cor: correlated limit
      • uldm_vecB_unc: uncorrelated limit 
    • uldm_vecBL: vector ULDM coupled to B-L
      • uldm_vecBL_cor: correlated limit
      • uldm_vecBL_unc: uncorrelated limit
    • uldm_c_grav: ULDM combined Earth + pulsar signal for gravitational-only coupling
      • uldm_c_grav_cor: correlated limit
        • uldm_c_cor_grav_low: low mass region  
        • uldm_c_cor_grav_mon: monopole region
        • uldm_c_cor_grav_low: high mass region
      • uldm_c_unc: uncorrelated limit
        • uldm_c_unc_grav_low: low mass region  
        • uldm_c_unc_grav_mon: monopole region
        • uldm_c_unc_grav_low: high mass region

    For the substructure (static) model, we provide the chain for the marginalized distribution (as for the ULDM signal, the substructure signal is always superimposed to the SMBHB signal)

     
    more » « less
  4. Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition. MEG Data MEG data is in FIFF format and can be opened with MNE-Python. Data has been directly converted from the acquisition device native format without any preprocessing. Events contained in the data indicate the stimuli in numerical order. Subjects R2650 and R2652 heard stimulus 11b instead of 11. Predictor Variables The original audio files are copyrighted and cannot be shared, but the make_audio folder contains make_clips.py which can be used to extract the exact clips from the commercially available audiobook (ISBN 978-1480555280). The predictors directory contains all the predictors used in the original study as pickled eelbrain objects. They can be loaded in Python with the eelbrain.load.unpickle function. The TextGrids directory contains the TextGrids aligned to the audio files. Source Localization The localization.zip file contains files needed for source localization. Structural brain models used in the published analysis are reconstructed by scaling the FreeSurfer fsaverage brain (distributed with FreeSurfer) based on each subject's `MRI scaling parameters.cfg` file. This can be done using the `mne.scale_mri` function. Each subject's MEG folder contains a `subject-trans.fif` file which contains the coregistration between MEG sensor space and (scaled) MRI space, which is used to compute the forward solution. 
    more » « less
  5. A biodiversity dataset graph: DataONE

    The intended use of this archive is to facilitate (meta-)analysis of the Data Observation Network for Earth (DataONE). DataONE is a distributed infrastructure that provides information about earth observation data.

    This dataset provides versioned snapshots of the DataONE network as tracked by Preston [2] between 2018-11-06 and 2020-05-07 using "preston update -u https://dataone.org".

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the DataONE content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  

    To retrieve and verify the downloaded DataONE biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote https://zenodo.org/record/3849494/files

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> <http://purl.org/pav/previousVersion> <hash://sha256/8c67e0741d1c90db54740e08d2e39d91dfd73566ea69c1f2da0d9ab9780a9a9f> .
    <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> <http://purl.org/pav/previousVersion> <hash://sha256/3ed3acaca7ac57f546d0b8877c1927ab5e08c23eccaa8219600c59c77a72c685> .
    <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> <http://purl.org/pav/previousVersion> <hash://sha256/857753997a7595a1b372b05641b58a25d9408b7ff08d557ce1fe8b73e4bd383f> .
    <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> <http://purl.org/pav/previousVersion> <hash://sha256/7ee0376f4c3f7aeeda36927a5211395e5da8201e810e8c7e638a0fe23d001e88> .
    <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> <http://purl.org/pav/previousVersion> <hash://sha256/68b4974d8ab7c4c7a7a4305065839b60ba460aaa862590b34c67877738feba90> .
    <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> <http://purl.org/pav/previousVersion> <hash://sha256/060a76d56255bf9482c951748c91291fddeeb20f180632132be1344e081b2372> .
    <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> <http://purl.org/pav/previousVersion> <hash://sha256/29357bdfab4548025f8a5743301f5c3c9146fa436c39e3c9e019fb9409ac9c42> .
    <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> <http://purl.org/pav/previousVersion> <hash://sha256/3669cd95100d1d533eb8953ff4ec5092cbd8addb8879b3e6262191148a8a3ebb> .
    <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> <http://purl.org/pav/previousVersion> <hash://sha256/8dc1663299359d271cb1b4c14ad521d0f1be67743689dd18016543dc1e097efb> .
    <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> <http://purl.org/pav/previousVersion> <hash://sha256/dc4903e8afee651db1d9bf509f20503bf9c8e89679c4bcffb46d5b97440cb6de> .
    <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> <http://purl.org/pav/previousVersion> <hash://sha256/f3bed9db3092c744604df5f50248a2ec36e564fe78a65f45c4190283bd61c807> .
    <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> <http://purl.org/pav/previousVersion> <hash://sha256/e3c7b3b14b2b792e3e2e560a1b2bef059ac93f777dee616b836317bc9cbfcbf7> .
    <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> <http://purl.org/pav/previousVersion> <hash://sha256/631a4531e7bb052816d28454bbeec3428d5e7bfd1f148c4f21ce63a6cf86c650> .
    <hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> <http://purl.org/pav/previousVersion> <hash://sha256/87de0898919d2212977a586965e930ae45bdd1366073591c808c208a635e2814> .
    <hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> <http://purl.org/pav/previousVersion> <hash://sha256/79ec3ee370a0d38311bc352af07a36380cd3aa04dc98154cf723bbc73d12ee77> .
    <hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> <http://purl.org/pav/previousVersion> <hash://sha256/e54b360a4ca84a4503e4c10a8a8cca062c130be7429c8fe6ea1e0e82fe113e12> .
    <hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> <http://purl.org/pav/previousVersion> <hash://sha256/2910f784f84e112f124a56ce54bd06b76e510f90276629d2d144ce29e326d80f> .
    <hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> <http://purl.org/pav/previousVersion> <hash://sha256/bcb0bdff0689cfb06f586d057703e41d1c6ba409867232217081dd8cb5053c87> .
    <hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3> <http://purl.org/pav/previousVersion> <hash://sha256/a12f8c7fbf4fbfa71536c7e1b2614a35454dac6a7fe9e1cc0b4df41ab2269bef> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    file:/home/preston/preston-dataone/data/e5/5c/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945    OK    CONTENT_PRESENT_VALID_HASH    21580    hash://sha256/e55c1034d985740926564e94decd6dc7a70f779a33e7deb931553739cda16945
    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    file:/home/preston/preston-dataone/data/d0/dd/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f    OK    CONTENT_PRESENT_VALID_HASH    2035    hash://sha256/d0ddcc2111b6134a570bcc7d89375920ef4d754130cecc0727c79d2b05a9f81f
    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    file:/home/preston/preston-dataone/data/47/2d/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53    OK    CONTENT_PRESENT_VALID_HASH    1935    hash://sha256/472de9d1c9fd7e044aac409abfbfff9f12c6b69359df995d431009580ffb0f53
    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    file:/home/preston/preston-dataone/data/b2/98/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687    OK    CONTENT_PRESENT_VALID_HASH    1553    hash://sha256/b29879462cd43862129c5cf9b149c41ecd33ffef284a4dbea4ac1c0f90108687


    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --
    README

    -- executable java jar containing preston[2] v0.1.15. --
    preston.jar

    -- preston archives containing DataONE data files, associated provenance logs and a provenance index --
    preston-[00-ff].tar.gz

    -- individual provenance index files --
    2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a
    2aecaf289def0e23a27058bf7715f226ef9189905f0be13228174825633125cf
    2f65ae542401d4c2daf1bca70de640211da6749188f67d28ea71acd7d8ba070b
    35eb1e17e2bf3e71212cde35bdb03e8a6545a57483ea3c1633929257b70cf637
    3d38b70198e448674be6a63d14b9817f3a956f48bba7418fa7baa086a56c05b7
    66ad3e5e904740f1e835ac6718dda4279e0c24b204ea0d1113cda1352a5072ba
    7466a35e42dea7e2be068060ec0c926f9a8686388ed504ef5c6c990c1ba4e8d0
    81161d9746c2a5823641c436e773fb4508516b055da85f4494b38c545349da39
    8bf062872ce958545d361e9d53a552ffb025ac29ab875caad1157c0995d34f66
    a90eed8d70c54c8e554f2dfde4fceb434eda162d9615d62de96ded2344f88a78
    c33ef5e29100b323412f1f3bc66908c8e01e4f0d1db4ea3685d2fffc47981dd6
    c84dffef20fec958255e759db6445fc469d73695674a33ae6f7e567a088c9fe0
    d362d599d72000c4feb464db5a669b12e15fc3ca1a49b1e7d4d6f7d6d5d15411
    d9378616636be3686bbabd5bf29d50f0ef0e5ceb5ddd7dfce47f7e755b596b7d
    da26fa6e7371385ed3f61af9a766221c833060d59dfd4869bbd7110f95f288db
    e4103a75627857de3ee2e317429108611c244fc448c01d1d7bf652115c3b8a55
    eb368fedb8f100210dd968edcf80f4d13cab3dd64135a6ab744102cf15e68c94
    f13ab4bca04f894ae8eabb51fa01b4dfbc69f717eabc9896c728e2ba39c4db27
    f493baf276892a199a0b0d078359f64a38fe8ad3f807921f8d41ef73f7343b1f
    ff92b6c06ae5286bd2f1db679e0fcc4da294acb9bc01b2e9522378d99218c2e3

    --- end of file descriptions ---


    References

    [1] Data Observation Network for Earth (DataONE, https://dataone.org) accessed from 2018-11-06 to 2020-05-07 with provenance hash://sha256/2b5c445f0b7b918c14a50de36e29a32854ed55f00d8639e09f58f049b85e50e3.
    [2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .


    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.

     
    more » « less