skip to main content

Title: Influence of particle viscosity on mass transfer and heterogeneous ozonolysis kinetics in aqueous–sucrose–maleic acid aerosol

The ozonolysis kinetics of viscous aerosol particles containing maleic acid are studied. Kinetic fits are constrained by measured particle viscosities.

more » « less
Award ID(s):
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Royal Society of Chemistry (RSC)
Date Published:
Journal Name:
Physical Chemistry Chemical Physics
Page Range / eLocation ID:
15560 to 15573
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The data provided here accompany the publication "Drought Characterization with GPS: Insights into Groundwater and Reservoir Storage in California" [Young et al., (2024)] which is currently under review with Water Resources Research. (as of 28 May 2024)

    Please refer to the manuscript and its supplemental materials for full details. (A link will be appended following publication)

    File formatting information is listed below, followed by a sub-section of the text describing the Geodetic Drought Index Calculation.

    The longitude, latitude, and label for grid points are provided in the file "loading_grid_lon_lat".

    Time series for each Geodetic Drought Index (GDI) time scale are provided within "".

    The included time scales are for 00- (daily), 1-, 3-, 6-, 12- 18- 24-, 36-, and 48-month GDI solutions.

    Files are formatted following...

    Title: "grid point label L****"_"time scale"_month

    File Format: ["decimal date" "GDI value"]

    Gridded, epoch-by-epoch, solutions for each time scale are provided within "".

    Files are formatted following...

    Title: GDI_"decimal date"_"time scale"_month

    File Format: ["longitude" "latitude" "GDI value" "grid point label L****"]


    We develop the GDI following Vicente-Serrano et al. (2010) and Tang et al. (2023), such that the GDI mimics the derivation of the SPEI, and utilize the log-logistic distribution (further details below). While we apply hydrologic load estimates derived from GPS displacements as the input for this GDI (Figure 1a-d), we note that alternate geodetic drought indices could be derived using other types of geodetic observations, such as InSAR, gravity, strain, or a combination thereof. Therefore, the GDI is a generalizable drought index framework.

    A key benefit of the SPEI is that it is a multi-scale index, allowing the identification of droughts which occur across different time scales. For example, flash droughts (Otkin et al., 2018), which may develop over the period of a few weeks, and persistent droughts (>18 months), may not be observed or fully quantified in a uni-scale drought index framework. However, by adopting a multi-scale approach these signals can be better identified (Vicente-Serrano et al., 2010). Similarly, in the case of this GPS-based GDI, hydrologic drought signals are expected to develop at time scales that are both characteristic to the drought, as well as the source of the load variation (i.e., groundwater versus surface water and their respective drainage basin/aquifer characteristics). Thus, to test a range of time scales, the TWS time series are summarized with a retrospective rolling average window of D (daily with no averaging), 1, 3, 6, 12, 18, 24, 36, and 48-months width (where one month equals 30.44 days).

    From these time-scale averaged time series, representative compilation window load distributions are identified for each epoch. The compilation window distributions include all dates that range ±15 days from the epoch in question per year. This allows a characterization of the estimated loads for each day relative to all past/future loads near that day, in order to bolster the sample size and provide more robust parametric estimates [similar to Ford et al., (2016)]; this is a key difference between our GDI derivation and that presented by Tang et al. (2023). Figure 1d illustrates the representative distribution for 01 December of each year at the grid cell co-located with GPS station P349 for the daily TWS solution. Here all epochs between between 16 November and 16 December of each year (red dots), are compiled to form the distribution presented in Figure 1e.

    This approach allows inter-annual variability in the phase and amplitude of the signal to be retained (which is largely driven by variation in the hydrologic cycle), while removing the primary annual and semi-annual signals. Solutions converge for compilation windows >±5 days, and show a minor increase in scatter of the GDI time series for windows of ±3-4 days (below which instability becomes more prevalent). To ensure robust characterization of drought characteristics, we opt for an extended ±15-day compilation window. While Tang et al. (2023) found the log-logistic distribution to be unstable and opted for a normal distribution, we find that, by using the extended compiled distribution, the solutions are stable with negligible differences compared to the use of a normal distribution. Thus, to remain aligned with the SPEI solution, we retain the three-parameter log-logistic distribution to characterize the anomalies. Probability weighted moments for the log-logistic distribution are calculated following Singh et al., (1993) and Vicente-Serrano et al., (2010). The individual moments are calculated following Equation 3.

    These are then used to calculate the L-moments for shape (), scale (), and location () of the three-parameter log-logistic distribution (Equations 4 – 6).

    The probability density function (PDF) and the cumulative distribution function (CDF) are then calculated following Equations 7 and 8, respectively.

    The inverse Gaussian function is used to transform the CDF from estimates of the parametric sample quantiles to standard normal index values that represent the magnitude of the standardized anomaly. Here, positive/negative values represent greater/lower than normal hydrologic storage. Thus, an index value of -1 indicates that the estimated load is approximately one standard deviation dryer than the expected average load on that epoch.

    *Equations can be found in the main text.

    more » « less
  2. This data set for the manuscript entitled "Design of Peptides that Fold and Self-Assemble on Graphite" includes all files needed to run and analyze the simulations described in the this manuscript in the molecular dynamics software NAMD, as well as the output of the simulations. The files are organized into directories corresponding to the figures of the main text and supporting information. They include molecular model structure files (NAMD psf or Amber prmtop format), force field parameter files (in CHARMM format), initial atomic coordinates (pdb format), NAMD configuration files, Colvars configuration files, NAMD log files, and NAMD output including restart files (in binary NAMD format) and trajectories in dcd format (downsampled to 10 ns per frame). Analysis is controlled by shell scripts (Bash-compatible) that call VMD Tcl scripts or python scripts. These scripts and their output are also included.

    Version: 2.0

    Changes versus version 1.0 are the addition of the free energy of folding, adsorption, and pairing calculations (Sim_Figure-7) and shifting of the figure numbers to accommodate this addition.

    Conventions Used in These Files

    Structure Files
    - graph_*.psf or sol_*.psf (original NAMD (XPLOR?) format psf file including atom details (type, charge, mass), as well as definitions of bonds, angles, dihedrals, and impropers for each dipeptide.)

    - graph_*.pdb or sol_*.pdb (initial coordinates before equilibration)
    - repart_*.psf (same as the above psf files, but the masses of non-water hydrogen atoms have been repartitioned by VMD script repartitionMass.tcl)
    - freeTop_*.pdb (same as the above pdb files, but the carbons of the lower graphene layer have been placed at a single z value and marked for restraints in NAMD)
    - amber_*.prmtop (combined topology and parameter files for Amber force field simulations)
    - repart_amber_*.prmtop (same as the above prmtop files, but the masses of non-water hydrogen atoms have been repartitioned by ParmEd)

    Force Field Parameters
    CHARMM format parameter files:
    - par_all36m_prot.prm (CHARMM36m FF for proteins)
    - par_all36_cgenff_no_nbfix.prm (CGenFF v4.4 for graphene) The NBFIX parameters are commented out since they are only needed for aromatic halogens and we use only the CG2R61 type for graphene.
    - toppar_water_ions_prot_cgenff.str (CHARMM water and ions with NBFIX parameters needed for protein and CGenFF included and others commented out)

    Template NAMD Configuration Files
    These contain the most commonly used simulation parameters. They are called by the other NAMD configuration files (which are in the namd/ subdirectory):
    - template_min.namd (minimization)
    - template_eq.namd (NPT equilibration with lower graphene fixed)
    - template_abf.namd (for adaptive biasing force)

    - namd/min_*.0.namd

    - namd/eq_*.0.namd

    Adaptive biasing force calculations
    - namd/eabfZRest7_graph_chp1404.0.namd
    - namd/eabfZRest7_graph_chp1404.1.namd (continuation of eabfZRest7_graph_chp1404.0.namd)

    Log Files
    For each NAMD configuration file given in the last two sections, there is a log file with the same prefix, which gives the text output of NAMD. For instance, the output of namd/eabfZRest7_graph_chp1404.0.namd is eabfZRest7_graph_chp1404.0.log.

    Simulation Output
    The simulation output files (which match the names of the NAMD configuration files) are in the output/ directory. Files with the extensions .coor, .vel, and .xsc are coordinates in NAMD binary format, velocities in NAMD binary format, and extended system information (including cell size) in text format. Files with the extension .dcd give the trajectory of the atomic coorinates over time (and also include system cell information). Due to storage limitations, large DCD files have been omitted or replaced with new DCD files having the prefix stride50_ including only every 50 frames. The time between frames in these files is 50 * 50000 steps/frame * 4 fs/step = 10 ns. The system cell trajectory is also included for the NPT runs are output/eq_*.xst.

    Files with the .sh extension can be found throughout. These usually provide the highest level control for submission of simulations and analysis. Look to these as a guide to what is happening. If there are scripts with step1_*.sh and step2_*.sh, they are intended to be run in order, with step1_*.sh first.


    The directory contents are as follows. The directories Sim_Figure-1 and Sim_Figure-8 include README.txt files that describe the files and naming conventions used throughout this data set.

    Sim_Figure-1: Simulations of N-acetylated C-amidated amino acids (Ac-X-NHMe) at the graphite–water interface.

    Sim_Figure-2: Simulations of different peptide designs (including acyclic, disulfide cyclized, and N-to-C cyclized) at the graphite–water interface.

    Sim_Figure-3: MM-GBSA calculations of different peptide sequences for a folded conformation and 5 misfolded/unfolded conformations.

    Sim_Figure-4: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-5: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 295 K.

    Sim_Figure-5_replica: Temperature replica exchange molecular dynamics simulations for the peptide cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) with 20 replicas for temperatures from 295 to 454 K.

    Sim_Figure-6: Simulation of the peptide molecule cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) in free solution (no graphite).

    Sim_Figure-7: Free energy calculations for folding, adsorption, and pairing for the peptide CHP1404 (sequence: cyc(GTGSGTG-GPGG-GCGTGTG-SGPG)). For folding, we calculate the PMF as function of RMSD by replica-exchange umbrella sampling (in the subdirectory Folding_CHP1404_Graphene/). We make the same calculation in solution, which required 3 seperate replica-exchange umbrella sampling calculations (in the subdirectory Folding_CHP1404_Solution/). Both PMF of RMSD calculations for the scrambled peptide are in Folding_scram1404/. For adsorption, calculation of the PMF for the orientational restraints and the calculation of the PMF along z (the distance between the graphene sheet and the center of mass of the peptide) are in Adsorption_CHP1404/ and Adsorption_scram1404/. The actual calculation of the free energy is done by a shell script ("") in the 1_free_energy/ subsubdirectory. Processing of the PMFs must be done first in the 0_pmf/ subsubdirectory. Finally, files for free energy calculations of pair formation for CHP1404 are found in the Pair/ subdirectory.

    Sim_Figure-8: Simulation of four peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) where the peptides are far above the graphene–water interface in the initial configuration.

    Sim_Figure-9: Two replicates of a simulation of nine peptide molecules with the sequence cyc(GTGSGTG-GPGG-GCGTGTG-SGPG) at the graphite–water interface at 370 K.

    Sim_Figure-9_scrambled: Two replicates of a simulation of nine peptide molecules with the control sequence cyc(GGTPTTGGGGGGSGGPSGTGGC) at the graphite–water interface at 370 K.

    Sim_Figure-10: Adaptive biasing for calculation of the free energy of the folded peptide as a function of the angle between its long axis and the zigzag directions of the underlying graphene sheet.


    This material is based upon work supported by the US National Science Foundation under grant no. DMR-1945589. A majority of the computing for this project was performed on the Beocat Research Cluster at Kansas State University, which is funded in part by NSF grants CHE-1726332, CNS-1006860, EPS-1006860, and EPS-0919443. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, through allocation BIO200030. 
    more » « less
  3. A biodiversity dataset graph: UCSB-IZC

    The intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.

    This dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track ""].

    This archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].

    The images were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
     | grep -o -P ".*depict"\
     | sort\
     | uniq\
     | wc -l

    And the occurrences were counted using:

    $ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\
     | grep -o -P "occurrence/([0-9])+"\
     | sort\
     | uniq\
     | wc -l

    The archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.

    To retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar clone --remote,,

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .
    <hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c> <> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .

    To check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    OK    CONTENT_PRESENT_VALID_HASH    66438    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c
    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    OK    CONTENT_PRESENT_VALID_HASH    4093    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844
    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    OK    CONTENT_PRESENT_VALID_HASH    5746    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef
    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    OK    CONTENT_PRESENT_VALID_HASH    6147    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --

    -- executable java jar containing preston [2,3] v0.3.1. --

    -- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --

    -- individual provenance index files --

    -- example image and meta-data --
    sample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)
    sample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)

    --- end of file descriptions ---


    [1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset accessed via on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.
    [2], .
    [3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics.
    [4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset accessed via on 2021-10-08. . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c

    This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation. 
    more » « less
  4. PLEASE CONTACT AUTHORS IF YOU CONTRIBUTE AND WOULD LIKE TO BE LISTED AS A CO-AUTHOR. (this message will be removed some time weeks/months after the first publication)

    Terrestrial Parasite Tracker indexed biotic interactions and review summary.

    The Terrestrial Parasite Tracker (TPT) project began in 2019 and is funded by the National Science foundation to mobilize data from vector and ectoparasite collections to data aggregators (e.g., iDigBio, GBIF) to help build a comprehensive picture of arthropod host-association evolution, distributions, and the ecological interactions of disease vectors which will assist scientists, educators, land managers, and policy makers. Arthropod parasites often are important to human and wildlife health and safety as vectors of pathogens, and it is critical to digitize these specimens so that they, and their biotic interaction data, will be available to help understand and predict the spread of human and wildlife disease.

    This data publication contains versioned TPT associated datasets and related data products that were tracked, reviewed and indexed by Global Biotic Interactions (GloBI) and associated tools. GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open source software.

    If you have questions or comments about this publication, please open an issue at or contact the authors by email.

    The creation of this archive was made possible by the National Science Foundation award "Collaborative Research: Digitization TCN: Digitizing collections to trace parasite-host associations and predict the spread of vector-borne disease," Award numbers DBI:1901932 and DBI:1901926

    Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics.

    GloBI Data Review Report

    Datasets under review:
     - University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. accessed via on 2022-06-24T14:02:48.801Z
     - Academy of Natural Sciences Entomology Collection for the Parasite Tracker Project accessed via on 2022-06-24T14:04:22.091Z
     - Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology accessed via on 2022-06-24T14:04:37.692Z
     - Texas A&M University, Biodiversity Teaching and Research Collections accessed via on 2022-06-24T14:06:40.154Z
     - Brigham Young University Arthropod Museum accessed via on 2022-06-24T14:06:51.420Z
     - California Academy of Sciences Entomology accessed via on 2022-06-24T14:07:16.371Z
     - Clemson University Arthropod Collection accessed via on 2022-06-24T14:07:40.925Z
     - Denver Museum of Nature and Science (DMNS) Parasite specimens (DMNS:Para) accessed via on 2022-06-24T14:08:00.730Z
     - Field Museum of Natural History IPT accessed via on 2022-06-24T14:18:51.995Z
     - Illinois Natural History Survey Insect Collection accessed via on 2022-06-24T14:19:37.563Z
     - UMSP / University of Minnesota / University of Minnesota Insect Collection accessed via on 2022-06-24T14:20:27.232Z
     - Milwaukee Public Museum Biological Collections Data Portal accessed via on 2022-06-24T14:20:46.185Z
     - Museum for Southern Biology (MSB) Parasite Collection accessed via on 2022-06-24T15:16:07.223Z
     - The Albert J. Cook Arthropod Research Collection accessed via on 2022-06-24T16:09:40.702Z
     - Ohio State University Acarology Laboratory accessed via on 2022-06-24T16:10:00.281Z
     - Frost Entomological Museum, Pennsylvania State University accessed via on 2022-06-24T16:10:07.741Z
     - Purdue Entomological Research Collection accessed via on 2022-06-24T16:10:26.654Z
     - Texas A&M University Insect Collection accessed via on 2022-06-24T16:10:58.496Z
     - University of California Santa Barbara Invertebrate Zoology Collection accessed via on 2022-06-24T16:12:29.854Z
     - University of Hawaii Insect Museum accessed via on 2022-06-24T16:12:41.408Z
     - University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC accessed via on 2022-06-24T16:12:59.500Z
     - Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. accessed via on 2022-06-24T16:13:06.914Z
     - Data were obtained from specimens belonging to the United States National Museum of Natural History (USNM), Smithsonian Institution, Washington DC and digitized by the Walter Reed Biosystematics Unit (WRBU). accessed via on 2022-06-24T16:13:38.013Z
     - US National Museum of Natural History Ixodes Records accessed via on 2022-06-24T16:13:45.666Z
     - Price Institute of Parasite Research, School of Biological Sciences, University of Utah accessed via on 2022-06-24T16:13:54.724Z
     - University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection accessed via on 2022-06-24T16:14:04.745Z
     - Giraldo-Calderón, G. I., Emrich, S. J., MacCallum, R. M., Maslen, G., Dialynas, E., Topalis, P., … Lawson, D. (2015). VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic acids research, 43(Database issue), D707–D713. doi:10.1093/nar/gku1117. accessed via on 2022-06-24T16:14:11.965Z
     - WIRC / University of Wisconsin Madison WIS-IH / Wisconsin Insect Research Collection accessed via on 2022-06-24T16:14:29.743Z
     - Yale University Peabody Museum Collections Data Portal accessed via on 2022-06-24T16:23:29.289Z

    Generated on:

    GloBI's Elton 0.12.4 

    Note that all files ending with .tsv are files formatted 
    as UTF8 encoded tab-separated values files.

    Included in this review archive are:

      This file.

      Summary across all reviewed collections of total number of distinct review comments.

      Summary by reviewed collection of total number of distinct review comments.

      Summary of number of indexed interaction records by institutionCode and collectionCode.

      All review comments by collection.

      All indexed interactions for all reviewed collections.

      All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName.

      Details on the datasets under review.

      Program used to update datasets and generate the review reports and associated indexed interactions.
      Source datasets used by elton.jar in process of executing the script.
      Program used to generate the report

      Log file generated as part of running the script

    more » « less
  5. A biodiversity dataset graph: GBIF, iDigBio, BioCASe

    The intended use of this archive is to facilitate meta-analysis of the Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe). GBIF, iDigBio and BioCASe help provide access to biological data collections. 

    This dataset provides versioned provenance logs of snapshots of the GBIF, iDigBio, BioCASe network as tracked by Preston [2] between 2018-09-03 and 2019-10-02 using "preston update -u,,". 

    This publication contains two types of files: index files and provenance logs. Associated data files are hosted elsewhere for pragmatic reasons. Index files provide a way to link provenance files in time to establish a versioning mechanism. Provenance logs describe how, when, what and where the GBIF, iDigBio, BioCASe content was retrieved. For more information, please visit or .  

    To retrieve and verify the downloaded GBIF, iDigBio, BioCASe biodiversity dataset graph, use the preston[2] command-line tool to "clone" this dataset using:

    $ java -jar preston.jar ls --remote > /dev/null

    Optionally, you can retrieve all associated data (~500GB) files using:

    $ java -jar preston.jar clone --remote,

    Please note is a Preston remote that provided access to GBIF, iDigBio, BioCASe data files at time of writing (13 Oct 2019). This remote can replaced with any other Preston remote(s) if needed. This may take a while depending on network speed and hardware constraints.

    After that, verify the index of the archive by reproducing the following provenance log history:

    $ java -jar preston.jar history
    <0659a54f-b713-4f86-a917-5be166a14110> <> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> <> <hash://sha256/c253a5311a20c2fc082bf9bac87a1ec5eb6e4e51ff936e7be20c29c8e77dee55> .
    <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> <> <hash://sha256/b83cf099449dae3f633af618b19d05013953e7a1d7d97bc5ac01afd7bd9abe5d> .
    <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> <> <hash://sha256/7efdea9263e57605d2d2d8b79ccd26a55743123d0c974140c72c8c1cfc679b93> .
    <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> <> <hash://sha256/05a877bdb8617144fe166a13bf51828d4ad1bc11631c360b9e648a9f7df2bbcd> .
    <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> <> <hash://sha256/b5a30bbd8d51e9faf08d4ddebbc5bda9bab1b12545172f1524ac5ebdb0038bd4> .
    <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> <> <hash://sha256/1d3817d9cb9fc7de7a3b7a4181daba8de1e52b348280154e8a163c7dd7ee1a7e> .
    <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> <> <hash://sha256/24b3f981c88c747f44ad3372095767cd15dcf81bd6cd2e54328a90a21409df43> .
    <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> <> <hash://sha256/ba02b235fd445904eae45b50bc637a195f25e9ca1637bcf26b2dc7f8698aa1fe> .
    <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> <> <hash://sha256/102cbfb1e800ef795ba1e1c51a34bff9b463b34c9443435069ddc76970c1e9c9> .
    <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> <> <hash://sha256/fd27b0552c8a6800a8b3b1b822a2063a3215c1d9887badad09a62746b80846bc> .
    <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> <> <hash://sha256/20d36a6f879ba1dd797d4288a4f2e32719d3c674156194c2765a3ec6b43f5e17> .
    <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> <> <hash://sha256/7801a034fe3c7920e032d2338a690b700ca41a90a92d878fc3a67111cad16d29> .
    <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> <> <hash://sha256/c1b50502b1ca87046eeb7fe4863d0cf9319b6645ff2142db69f21b4cc23332b6> .
    <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> <> <hash://sha256/dc293e26154b89273791b9674d81110029f987c686b386184d0b66a5b95f9cda> .
    <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> <> <hash://sha256/f3ed6aa1bd15ee43d05e138b935040aaa745f6ca8c7e8f2dfbb0a3ae0df66f36> .
    <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> <> <hash://sha256/650a28fff3e03dadba70dc05a34c580c04203380187953fa4a2fb778353fee79> .
    <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> <> <hash://sha256/e4e5736e8bfec6c686eedde4c6dfa62845930d04e12dfa6f8a7d70abc3d087df> .
    <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> <> <hash://sha256/e69d186ff3be11830c2da67d1bfeb896ec6398fc9d555fa26eaae1baa54450fb> .
    <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> <> <hash://sha256/3e7f19a8a78b51437240f49c499e6e7f89b8d58d4e3ceb9480d4356721645cee> .
    <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> <> <hash://sha256/5c469224fa0b6159bf33a59ddaa0246634e81bddd1728e7bf3540745055eccfa> .
    <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> <> <hash://sha256/eb2c716ec85158a0785216de1b09965173fc368d12f213c1bf747bbc2e49c6a6> .
    <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> <> <hash://sha256/3dd674b7ad16391629948981a9cb6f6f86937d016861c3e59cd6e6bf3589f3b7> .
    <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> <> <hash://sha256/480868b59e95f3ce2324a7308dba65795e857d34cfbdcea7440a6f2620c6fbf6> .
    <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> <> <hash://sha256/58daa9a51e5dc0911163aa1b98d68c801106734cd29eab9980814057351aeb70> .
    <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> <> <hash://sha256/a0a18b0e32f933112084b846863438038f66f63eeeb22fa9d8d734e8a25bb208> .
    <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> <> <hash://sha256/a7a5e7c6a4b21bdf67f48d6bea85f438b8133f674027b04625dfadec3ff985f6> .
    <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> <> <hash://sha256/0e6b49850d96b4b58ea3759ecea45d273a48f074c4edaaec5e008791d7718781> .
    <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> <> <hash://sha256/8c0752dc6425b9c716837c9713ce284158b4cff70a1e66be2beb0677018831f4> .
    <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> <> <hash://sha256/d99fa37caa268f8061980001146ed2a566e814d0740bb1974b76847512be95d3> .
    <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> <> <hash://sha256/af0bb2c89571a30815d4488e72dede84a2ffc102bb87961f06884509fd5d1dae> .
    <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> <> <hash://sha256/261177a96185166f1c301beacf7350abff03d1b5710be6bfd8c4aff9caffef12> .
    <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> <> <hash://sha256/5a39b7bbe9d1bc46ed2eb7bd76c490b5c85a09369a7cf7dc18fa04532679e9a7> .
    <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> <> <hash://sha256/af8f9ed321d9c403617f54a96e3217adc918970fbbfe8b8715359669f4890b63> .
    <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> <> <hash://sha256/9a41d2583f0b8169ffdd44fb2d3a5e057eba4a10e5d9193d0c6e9dcf07c3119e> .
    <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> <> <hash://sha256/b9864a749112cad2fe19e62bf5d8bad580a7036d363d16d81d5c16be325fa0fd> .
    <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> <> <hash://sha256/09574d9c1330c2b1bec9b7bf3a55ab9273bedbfed78affd70a058a1a25e052d2> .
    <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> <> <hash://sha256/668d5d6e9c9e7ddb410073ff75eb7f2935c60cc62944ba1fd96ca60feec4a103> .
    <hash://sha256/d79fb9207329a2813b60713cf0968fda10721d576dcb7a36038faf18027eebc1> <> <hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7> .

    If you retrieved data files, you can check the integrity of the extracted archive by confirming that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.

    $ java -jar preston.jar verify
    hash://sha256/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    file:/home/preston/preston-archive/data/3e/ff/3eff98d4b66368fd8d1f8fa1af6a057774d8a407a4771490beeb9e7add76f362    OK    CONTENT_PRESENT_VALID_HASH    89931
    hash://sha256/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    file:/home/preston/preston-archive/data/18/48/184886cc6ae4490a49a70b6fd9a3e1dfafce433fc8e3d022c89e0b75ea3cda0b    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    file:/home/preston/preston-archive/data/18/46/1846abf2b9623697cf9b2212e019bc1f6dc4a20da51b3b5629bfb964dc808c02    OK    CONTENT_PRESENT_VALID_HASH    210344
    hash://sha256/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    file:/home/preston/preston-archive/data/55/4f/554fdab07f2372bf363a1d7ef30fcf4c32e1da98b95a6342780c5eb35e0e7b38    OK    CONTENT_PRESENT_VALID_HASH    202701

    Note that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston". 

    Files in this data publication:

    --- start of file descriptions ---

    -- description of archive and its contents (this file) --

    -- executable java jar containing preston[2] v0.1.8. --

    -- individual provenance index files --

    --- individual provenance logs --

    --- end of file descriptions ---


    [1] Global Biodiversity Information Facility, Integrated Digitized Biocollections, Biological Collection Access Service (GBIF, iDigBio, BioCASe,,, accessed from 2018-09-03 to 2019-10-02 with provenance hash://sha256/6387c9ebed9507a0fbba2d161e83c2da73e0d6fa6dd51fb19ac4a4ca75b839c7.
    [2], . 

    This work is funded in part by grant NSF OAC 1839201 from the National Science Foundation

    more » « less