skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Data accompanying "Drought Characterization with GPS: Insights into Groundwater and Reservoir Storage in California" [Young et al., (2024)]
The data provided here accompany the publication "Drought Characterization with GPS: Insights into Groundwater and Reservoir Storage in California" [Young et al., (2024)] which is currently under review with Water Resources Research. (as of 28 May 2024)Please refer to the manuscript and its supplemental materials for full details. (A link will be appended following publication)File formatting information is listed below, followed by a sub-section of the text describing the Geodetic Drought Index Calculation. The longitude, latitude, and label for grid points are provided in the file "loading_grid_lon_lat".Time series for each Geodetic Drought Index (GDI) time scale are provided within "GDI_time_series.zip".The included time scales are for 00- (daily), 1-, 3-, 6-, 12- 18- 24-, 36-, and 48-month GDI solutions.Files are formatted following...Title: "grid point label L****"_"time scale"_monthFile Format: ["decimal date" "GDI value"]Gridded, epoch-by-epoch, solutions for each time scale are provided within "GDI_grids.zip".Files are formatted following...Title: GDI_"decimal date"_"time scale"_monthFile Format: ["longitude" "latitude" "GDI value" "grid point label L****"]2.2 GEODETIC DROUGHT INDEX CALCULATION We develop the GDI following Vicente-Serrano et al. (2010) and Tang et al. (2023), such that the GDI mimics the derivation of the SPEI, and utilize the log-logistic distribution (further details below). While we apply hydrologic load estimates derived from GPS displacements as the input for this GDI (Figure 1a-d), we note that alternate geodetic drought indices could be derived using other types of geodetic observations, such as InSAR, gravity, strain, or a combination thereof. Therefore, the GDI is a generalizable drought index framework. A key benefit of the SPEI is that it is a multi-scale index, allowing the identification of droughts which occur across different time scales. For example, flash droughts (Otkin et al., 2018), which may develop over the period of a few weeks, and persistent droughts (>18 months), may not be observed or fully quantified in a uni-scale drought index framework. However, by adopting a multi-scale approach these signals can be better identified (Vicente-Serrano et al., 2010). Similarly, in the case of this GPS-based GDI, hydrologic drought signals are expected to develop at time scales that are both characteristic to the drought, as well as the source of the load variation (i.e., groundwater versus surface water and their respective drainage basin/aquifer characteristics). Thus, to test a range of time scales, the TWS time series are summarized with a retrospective rolling average window of D (daily with no averaging), 1, 3, 6, 12, 18, 24, 36, and 48-months width (where one month equals 30.44 days). From these time-scale averaged time series, representative compilation window load distributions are identified for each epoch. The compilation window distributions include all dates that range ±15 days from the epoch in question per year. This allows a characterization of the estimated loads for each day relative to all past/future loads near that day, in order to bolster the sample size and provide more robust parametric estimates [similar to Ford et al., (2016)]; this is a key difference between our GDI derivation and that presented by Tang et al. (2023). Figure 1d illustrates the representative distribution for 01 December of each year at the grid cell co-located with GPS station P349 for the daily TWS solution. Here all epochs between between 16 November and 16 December of each year (red dots), are compiled to form the distribution presented in Figure 1e. This approach allows inter-annual variability in the phase and amplitude of the signal to be retained (which is largely driven by variation in the hydrologic cycle), while removing the primary annual and semi-annual signals. Solutions converge for compilation windows >±5 days, and show a minor increase in scatter of the GDI time series for windows of ±3-4 days (below which instability becomes more prevalent). To ensure robust characterization of drought characteristics, we opt for an extended ±15-day compilation window. While Tang et al. (2023) found the log-logistic distribution to be unstable and opted for a normal distribution, we find that, by using the extended compiled distribution, the solutions are stable with negligible differences compared to the use of a normal distribution. Thus, to remain aligned with the SPEI solution, we retain the three-parameter log-logistic distribution to characterize the anomalies. Probability weighted moments for the log-logistic distribution are calculated following Singh et al., (1993) and Vicente-Serrano et al., (2010). The individual moments are calculated following Equation 3. These are then used to calculate the L-moments for shape (), scale (), and location () of the three-parameter log-logistic distribution (Equations 4 – 6). The probability density function (PDF) and the cumulative distribution function (CDF) are then calculated following Equations 7 and 8, respectively. The inverse Gaussian function is used to transform the CDF from estimates of the parametric sample quantiles to standard normal index values that represent the magnitude of the standardized anomaly. Here, positive/negative values represent greater/lower than normal hydrologic storage. Thus, an index value of -1 indicates that the estimated load is approximately one standard deviation dryer than the expected average load on that epoch. *Equations can be found in the main text.  more » « less
Award ID(s):
2021637 1900646
PAR ID:
10515149
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
figshare
Date Published:
Subject(s) / Keyword(s):
Geodesy Surface water hydrology Groundwater hydrology
Format(s):
Medium: X Size: 2770810580 Bytes
Size(s):
2770810580 Bytes
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Drought intensity is commonly characterized using meteorologically‐based metrics that do not provide insight into water deficits within deeper hydrologic systems. In contrast, global positioning system (GPS) displacements are sensitive to both local and regional hydrologic‐storage fluctuations. While a few studies have leveraged this sensitivity to produce geodetic drought indices, hydrologic drought characterization using GPS is not commonly accounted for in drought assessment and management. To motivate this application, we produce a new geodetic drought index (GDI) and quantify its ability to characterize hydrologic drought conditions in key surface and sub‐surface hydrologic reservoirs/pools across California. In northern California, the GDI exhibits a strong regional association with surface‐reservoir storage at the 1‐month time scale (correlation coefficient: 0.83) and groundwater levels at the 3‐month time scale (correlation coefficient: 0.87), along with moderate associations with stream discharge at the daily (instantaneous) time scale (correlation coefficient: 0.50). Groundwater in southern California is best characterized with a 12‐month GDI (correlation coefficient: 0.77), and surface‐reservoir storage is optimized with the 3‐month GDI (correlation coefficient: 0.72). Two sigma uncertainties are ±0.03. Differences between northern and southern California reveal that the GDI is sensitive to unique aquifer and drainage basin characteristics. In addition to capturing long‐term hydrologic trends, rapid changes in the GDI initiate during clusters of large atmospheric river events that closely mirror fluctuations in traditional hydrologic and meteorological observations. We show that GPS‐based hydrologic drought indices provide a significant opportunity to improve drought assessment, in California and beyond, by improving our understanding of the hydrologic cycle. 
    more » « less
  2. {"Abstract":["This classified_bed data product represents the radar bed classification shown in <a href="https://doi.org/10.1098/rsta.2014.0297">Young et al., 2016</a>. Values of 0 represent specularity content below 20%; values of 3.3 represent specularity content above 20% and energy 1 microsecond below the bed 15 dB lower than the bed echo, and values of 6.7 represent specularity content above 20% and energy 1 microsecond below the bed 15 dB within than the bed echo. Grids for specularity content and post bed echo are also available. Data is available as COARDS-compliant netCDF-4/HDF5 grids (.grd) and GeoTiffs (.tiff), both in EPSG 3031 (Antarctic Polar Stereographic) projection.\n<p>\n<p>\nData were gridded using <a href="https://docs.generic-mapping-tools.org/6.1/gmt.html"> GMT6.1</a> and the <a href="https://github.com/sakov/nn-c">nnbathy</a> natural neighbor interpolator. Cell size was 1 km, gaussian filter distance was 5 km, and mask radius was 2 km.\n<p>\nBrowse images, with Bedmap3 (Pritchard et al., 2025) surface elevation contours and MEASURES phase derived surface velocities (Mouginot et al. 2019) are available for each dataset.\n\n<p>\n<p>\nAn interpretation of the values in the classified_bed product is that low values are rough bed, intermediate values are isotropic wet bed, and high values are anisotropic wet bed.\n\nVersion 1 includes data from the 2016 paper, including AGASEA over Thwaites Glacier (Holt et al., 2006), ATRS over West Antarctica (Peters et al., 2005), GIMBLE over Marie Byrd Land (Young et al, 2013) and parts of ICECAP over Wilkes Subglacial Basin, Dome C, Highland B and Totten Glacier. (Young et al, 2011, Young et al., 2016). We expect updates to the coverage as part of work funded by the Arête Glaciers Initiative.\n\n<p>\n<b>References</b>\n<br>\nHolt, J. W., Blankenship, D. D., Morse, D. L., Young, D. A., Peters, M. E., Kempf, S. D., Richter, T. G., Vaughan, D. G., and Corr, H., New boundary conditions for the West Antarctic ice sheet: subglacial topography of the Thwaites and Smith Glacier catchments, 2006, Geophysical Research Letters, 33 (L09502), pp., https://doi.org/10.1029/2005GL025561\n<br>\nMouginot, J., Rignot, E., and Scheuchl, B., Continent-wide, interferometric SAR phase, mapping of Antarctic ice velocity, 2019, Geophysical Research Letters, 46(16), pp.9710-9718, https://doi.org/10.1029/2019GL083826\n<br>\nPeters, M. E., Blankenship, D. D., and Morse, D. L., Analysis techniques for coherent airborne radar sounding: Application to West Antarctic ice streams, 2005 ,Journal of Geophysical Research, 110(B06303), pp.,https://doi.org/10.1029/2004JB003222\n<br>\nPritchard, H. D., and others.,Bedmap3 updated ice bed, surface and thickness gridded datasets for Antarctica,2025,Scientific Data,12(1), pp.414,https://doi.org/10.1038/s41597-025-04672-y\n<br>\nYoung, D. A., D. D. Blankenship, J. S. Greenbaum, E. Quartini, G. L. Muldoon, F. Habbal, L. E. Lindzey, C. A. Greene, E. M. Powell, G. C. Ng, T. G. Richter, G. Echeverry, and S. Kempf, 2024, Geophysical Investigations of Marie Byrd Land Lithospheric Evolution (GIMBLE) Airborne VHF Radar Transects: 2012/2013 and 2014/2015, https://doi.org/10.18738/T8/BMXUHX, Texas Data Repository\n<br>\nYoung, D. A., Wright, A. P., Roberts, J. L., Warner, R. C., Young, N. W., Greenbaum, J. S., Schroeder, D. M., Holt, J. W., Sugden, D. E., Blankenship, D. D., van Ommen, T. D., and Siegert, M. J.,A dynamic early East Antarctic Ice Sheet suggested by ice covered fjord landscapes, 2011, Nature, 474, pp.72-75, https://doi.org/10.1038/nature10114\n<br>\nYoung, D. A., Schroeder, D. M., Blankenship, D. D., Kempf, S. D., and Quartini, E.,The distribution of basal water between Antarctic subglacial lakes from radar sounding,2016,Philosophical Transactions of the Royal Society A, 374 (20140297), pp.1-21, https://doi.org/10.1098/rsta.2014.0297\n\n<p>\n<b>Change Log</b>\n<br>\nChanges from V1: changes to gridding parameters to more closely match the figures from Young 2016; updated metadata gridding description"]} 
    more » « less
  3. <p><b> Introduction </b> <br> The National Science Foundations Center for Oldest Ice Exploration (<a href="https://www.coldex.org">NSF COLDEX</a>) is a Science and Technology Center working to extend the record of atmospheric gases, temperature and ice sheet history to greater than 1 million years. As part of this effort, NSF COLDEX has been searching for a site for a continuous ice core extending through the mid-Pleistocene transition. Two seasons of airborne survey were conducted from South Pole Station across the southern flank of Dome A. </p> <p><b> 2023-2024 Field Season </b> <br> In the 2023-2024 field season (CXA2), and using a BT-67 Basler, NSF COLDEX conducted 17 flights from South Pole Station toward the southern flank of Dome C. Three test flights were conducted from McMurdo Station. Instrumentation included the <a href="https://doi.org/10.18738/T8/J38CO5">60 MHz MARFA ice penetrating radar </a> from the University of Texas Institute for Geophysics, a <a href="https://doi.org/10.1109/IGARSS53475.2024.10640448">UHF ice penetrating radar </a> from the Center for Remote Sensing and Integrated Systems; an GT-2 Gravimeter, and LD-90 laser altimeter and an G-823 Magnetometer. </p> <p><b> Basal specularity content </b> <br> These basal specularity content were derived from comparing 1D and 2D focused MARFA data (<a href="http://doi.org/10.1109/TGRS.2007.897416">Peters et al., 2007</a>). By comparing bed echo strengths for different focusing apertures, and accounting for the ranges and angles involved, we can derive the "specularity content" of the bed echo, a proxy for small scale bed roughness and a good indicator for subglacial water pressure in regions of distributed subglacial water (<a href="https://doi.org/10.1109/LGRS.2014.2337878">Schroeder et al., 2014, IEEE GRSL </a>, <a href="https://doi.org/10.1016/j.epsl.2019.115961">Dow et al., 2019, EPSL </a>) and smooth deforming bed material (<a href="http://doi.org/10.1002/2014GL061645">Schroeder et al., 2014, GRL</a>, <a href="http://dx.doi/org/10.1098/rsta.2014.0297">Young et al., 2016, PTRS</a>. Specularity data are inherently noisy, so these products have been smoothed with a 1 km filter.</p> 
    more » « less
  4. {"Abstract":["A biodiversity dataset graph: BHL<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the Biodiversity Heritage Library (BHL). The Biodiversity Heritage Library improves research methodology by collaboratively making biodiversity literature openly available to the world as part of a global biodiversity community.<\/p>\n\nThis dataset provides versioned snapshots of the BHL network as tracked by Preston [2] between 2019-05-19 and 2020-05-09 using "preston update -u https://biodiversitylibrary.org".<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance logs and data files. In addition, index files have been individually included in this dataset publication to facilitate remote access. Index files provide a way to links provenance files in time to establish a versioning mechanism. Provenance files describe how, when, what and where the BHL content was retrieved. For more information, please visit https://preston.guoda.bio or https://doi.org/10.5281/zenodo.1410543 .  <\/p>\n\nTo retrieve and verify the downloaded BHL biodiversity dataset graph, first concatenate all the downloaded preston-*.tar.gz files (e.g., cat preston-*.tar.gz > preston.tar.gz). Then, extract the archives into a "data" folder. Alternatively, you can use the preston[2] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://zenodo.org/record/3849560/files<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> <http://purl.org/pav/previousVersion> <hash://sha256/89926f33157c0ef057b6de73f6c8be0060353887b47db251bfd28222f2fd801a> .\n<hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> <http://purl.org/pav/previousVersion> <hash://sha256/41b19aa9456fc709de1d09d7a59c87253bc1f86b68289024b7320cef78b3e3a4> .\n<hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> <http://purl.org/pav/previousVersion> <hash://sha256/7582d5ba23e0d498ca4f55c29408c477d0d92b4fdcea139e8666f4d78c78a525> .\n<hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> <http://purl.org/pav/previousVersion> <hash://sha256/a70774061ccded1a45389b9e6063eb3abab3d42813aa812391f98594e7e26687> .\n<hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> <http://purl.org/pav/previousVersion> <hash://sha256/007e065ba4b99867751d688754aa3d33fa96e6e03133a2097e8a368d613cd93a> .\n<hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> <http://purl.org/pav/previousVersion> <hash://sha256/4fb4b4d8f1ae2961311fb0080e817adb2faa746e7eae15249a3772fbe2d662a1> .\n<hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> <http://purl.org/pav/previousVersion> <hash://sha256/67cc329e74fd669945f503917fbb942784915ab7810ddc41105a82ebe6af5482> .\n<hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> <http://purl.org/pav/previousVersion> <hash://sha256/e46cd4b0d7fdb51ea789fa3c5f7b73591aca62d2d8f913346d71aa6cf0745c9f> .\n<hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> <http://purl.org/pav/previousVersion> <hash://sha256/9215d543418a80510e78d35a0cfd7939cc59f0143d81893ac455034b5e96150a> .\n<hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> <http://purl.org/pav/previousVersion> <hash://sha256/1448656cc9f339b4911243d7c12f3ba5366b54fff3513640306682c50f13223d> .\n<hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1> <http://purl.org/pav/previousVersion> <hash://sha256/7ee6b16b7a5e9b364776427d740332d8552adf5041d48018eeb3c0e13ccebf27> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca    file:/home/preston/preston-bhl/data/e0/c1/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca    OK    CONTENT_PRESENT_VALID_HASH    49458087    hash://sha256/e0c131ebf6ad2dce71ab9a10aa116dcedb219ae4539f9e5bf0e57b84f51f22ca\nhash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99    file:/home/preston/preston-bhl/data/1a/57/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99    OK    CONTENT_PRESENT_VALID_HASH    25745    hash://sha256/1a57e55a780b86cff38697cf1b857751ab7b389973d35113564fe5a9a58d6a99\nhash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c    file:/home/preston/preston-bhl/data/85/ef/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c    OK    CONTENT_PRESENT_VALID_HASH    519892    hash://sha256/85efeb84c1b9f5f45c7a106dd1b5de43a31b3248a211675441ff584a7154b61c\nhash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743    file:/home/preston/preston-bhl/data/25/1e/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743    OK    CONTENT_PRESENT_VALID_HASH    787414    hash://sha256/251e5032afce4f1e44bfdc5a8f0316ca1b317e8af41bdbf88163ab5bd2b52743<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston[2] v0.1.15. --\npreston.jar<\/p>\n\n-- preston archives containing BHL data files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a\n2b1104cb7749e818c9afca78391b2d0099bbb0a32f2b348860a335cd2f8f6800\n4081bc59dff58d63f6a86c623cb770f01e9a355a42495b205bcb538cd526190f\n47a2816f8b5600b24487093adcddfea12434cc4f270f3ab09d9215fbdd546cd2\n6f99a1388823fca745c9e22ac21e2da909a219aa1ace55170fa9248c0276903c\n7ae46d7cd9b5a0f5889ba38bac53c82e591b0bdf8b605f5e48c0dce8fb7b717f\n82903464889fea7c53f53daedf4e41fa31092f82619edeb3415eb2b473f74af3\n9e8c86243df39dd4fe82a3f814710eccf73aa9291d050415408e346fa2b09e70\na8308fbf4530e287927c471d881ce0fc852f16543d46e1ee26f1caba48815f3a\nbcec6df2ea7f74e9a6e2830d0072e6b2fbe65323d9ddb022dd6e1349c23996e2\ncfe47c25ec0210ac73c06b407beb20d9c58355cb15bae427fdc7541870ca2e4e\nf73fc9e70bce8f21f0c96b8ef0903749d8f223f71343ab5a8910968f99c9b8b6<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Biodiversity Heritage Library (BHL, https://biodiversitylibrary.org) accessed from 2019-05-19 to 2020-05-09 with provenance hash://sha256/34ccd7cf7f4a1ea35ac6ae26a458bb603b2f6ee8ad36e1a58aa0261105d630b1.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .<\/p>\n\n\nThis work is funded in part by grant NSF OAC 1839201 from the National Science Foundation.<\/p>"]} 
    more » « less
  5. {"Abstract":["A biodiversity dataset graph: UCSB-IZC<\/p>\n\nThe intended use of this archive is to facilitate (meta-)analysis of the UC Santa Barbara Invertebrate Zoology Collection (UCSB-IZC). UCSB-IZC is a natural history collection of invertebrate zoology at Cheadle Center of Biodiversity and Ecological Restoration, University of California Santa Barbara.<\/p>\n\nThis dataset provides versioned snapshots of the UCSB-IZC network as tracked by Preston [2,3] between 2021-10-08 and 2021-11-04 using [preston track "https://api.gbif.org/v1/occurrence/search/?datasetKey=d6097f75-f99e-4c2a-b8a5-b0fc213ecbd0"].<\/p>\n\nThis archive contains 14349 images related to 32533 occurrence/specimen records. See included sample-image.jpg and their associated meta-data sample-image.json [4].<\/p>\n\nThe images were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P ".*depict"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nAnd the occurrences were counted using:<\/p>\n\n$$ preston cat hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c\\\n | grep -o -P "occurrence/([0-9])+"\\\n | sort\\\n | uniq\\\n | wc -l<\/p>\n\nThe archive consists of 256 individual parts (e.g., preston-00.tar.gz, preston-01.tar.gz, ...) to allow for parallel file downloads. The archive contains three types of files: index files, provenance files and data files. Only two index and provenance files are included and have been individually included in this dataset publication. Index files provide a way to links provenance files in time to establish a versioning mechanism.<\/p>\n\nTo retrieve and verify the downloaded UCSB-IZC biodiversity dataset graph, first download preston-*.tar.gz. Then, extract the archives into a "data" folder. Alternatively, you can use the Preston [2,3] command-line tool to "clone" this dataset using:<\/p>\n\n$$ java -jar preston.jar clone --remote https://archive.org/download/preston-ucsb-izc/data.zip/,https://zenodo.org/record/5557670/files,https://zenodo.org/record/5557670/files/5660088<\/p>\n\nAfter that, verify the index of the archive by reproducing the following provenance log history:<\/p>\n\n$$ java -jar preston.jar history\n<urn:uuid:0659a54f-b713-4f86-a917-5be166a14110> <http://purl.org/pav/hasVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .\n<hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c> <http://purl.org/pav/previousVersion> <hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36> .<\/p>\n\nTo check the integrity of the extracted archive, confirm that each line produce by the command "preston verify" produces lines as shown below, with each line including "CONTENT_PRESENT_VALID_HASH". Depending on hardware capacity, this may take a while.<\/p>\n\n$ java -jar preston.jar verify\nhash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    file:/home/jhpoelen/ucsb-izc/data/ce/1d/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c    OK    CONTENT_PRESENT_VALID_HASH    66438    hash://sha256/ce1dc2468dfb1706a6f972f11b5489dc635bdcf9c9fd62a942af14898c488b2c\nhash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    file:/home/jhpoelen/ucsb-izc/data/f6/8d/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844    OK    CONTENT_PRESENT_VALID_HASH    4093    hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844\nhash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    file:/home/jhpoelen/ucsb-izc/data/3e/70/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef    OK    CONTENT_PRESENT_VALID_HASH    5746    hash://sha256/3e70b7adc1a342e5551b598d732c20b96a0102bb1e7f42cfc2ae8a2c4227edef\nhash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    file:/home/jhpoelen/ucsb-izc/data/99/58/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b    OK    CONTENT_PRESENT_VALID_HASH    6147    hash://sha256/995806159ae2fdffdc35eef2a7eccf362cb663522c308aa6aa52e2faca8bb25b<\/p>\n\nNote that a copy of the java program "preston", preston.jar, is included in this publication. The program runs on java 8+ virtual machine using "java -jar preston.jar", or in short "preston".<\/p>\n\nFiles in this data publication:<\/p>\n\n--- start of file descriptions ---<\/p>\n\n-- description of archive and its contents (this file) --\nREADME<\/p>\n\n-- executable java jar containing preston [2,3] v0.3.1. --\npreston.jar<\/p>\n\n-- preston archive containing UCSB-IZC (meta-)data/image files, associated provenance logs and a provenance index --\npreston-[00-ff].tar.gz<\/p>\n\n-- individual provenance index files --\n2a5de79372318317a382ea9a2cef069780b852b01210ef59e06b640a3539cb5a<\/p>\n\n-- example image and meta-data --\nsample-image.jpg (with hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c)\nsample-image.json (with hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844)<\/p>\n\n--- end of file descriptions ---<\/p>\n\n\nReferences<\/p>\n\n[1] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-11-04 as indexed by the Global Biodiversity Informatics Facility (GBIF) with provenance hash://sha256/d5eb492d3e0304afadcc85f968de1e23042479ad670a5819cee00f2c2c277f36 hash://sha256/80c0f5fc598be1446d23c95141e87880c9e53773cb2e0b5b54cb57a8ea00b20c.\n[2] https://preston.guoda.bio, https://doi.org/10.5281/zenodo.1410543 .\n[3] MJ Elliott, JH Poelen, JAB Fortes (2020). Toward Reliable Biodiversity Dataset References. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2020.101132\n[4] Cheadle Center for Biodiversity and Ecological Restoration (2021). University of California Santa Barbara Invertebrate Zoology Collection. Occurrence dataset https://doi.org/10.15468/w6hvhv accessed via GBIF.org on 2021-10-08. https://www.gbif.org/occurrence/3323647301 . hash://sha256/f68d489a9275cb9d1249767244b594c09ab23fd00b82374cb5877cabaa4d0844 hash://sha256/916ba5dc6ad37a3c16634e1a0e3d2a09969f2527bb207220e3dbdbcf4d6b810c<\/p>"],"Other":["This work is funded in part by grant NSF OAC 1839201 and NSF DBI 2102006 from the National Science Foundation."]} 
    more » « less