skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: caltechdata_api – Addition of test mode and initial metadata validation v1.8.0
This release adds a -test flag to the cli, and adds some new metadata validation checks.  more » « less
Award ID(s):
2322420
PAR ID:
10617401
Author(s) / Creator(s):
; ;
Publisher / Repository:
CaltechDATA
Date Published:
Subject(s) / Keyword(s):
GitHub IGA InvenioRDM metadata Python software
Format(s):
Medium: X
Right(s):
BSD 3 Clause
Sponsoring Org:
National Science Foundation
More Like this
  1. Changes get_author_records function definition to make the token optional and switches to using the full date. Adds author affiliation automation for CaltechAUTHORS. Added new function for updating DOIs in CaltechAUTHORS. Adds new function to get series from CaltechAUTHORS. Re-enables DOI validation on datacite harvester. Adds backend functions for automated crosslinking for CaltechDATA and CaltechAUTHORS. 
    more » « less
  2. This release adds CaltechAUTHORS support to get_metadata, as well as the edit.py example. It also includes better token behavior and updated release workflows. 
    more » « less
  3. The direct observation of enhanced dislocation mobility in iron by in situ electron microscopy offers key insights and adds to the ongoing debate on the mechanisms of hydrogen embrittlement. 
    more » « less
  4. This release included a groups option to the get_publisher, which returns the groups associated with a record. It also adds a get_doi function which returns the doi of a record. 
    more » « less
  5. Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for col- umn data storage that provides several of these benefits out of the box. However, geospatial data is not readily supported by Parquet. This paper introduces Spatial Parquet, a Parquet extension that efficiently supports geospatial data. Spatial Parquet inherits all the advantages of Parquet for non-spatial data, such as rich data types, compression, and column/row filtering. Additionally, it adds three new features to accommodate geospatial data. First, it introduces a geospatial data type that can encode all standard spatial geome- tries in a column format compatible with Parquet. Second, it adds a new lossless and efficient encoding method, termed FP-delta, that is customized to efficiently store geospatial coordinates stored in floating-point format. Third, it adds a light-weight spatial index that allows the reader to skip non-relevant parts of the file for increased read efficiency. Experiments on large-scale real data showed that Spatial Parquet can reduce the data size by a factor of three even without compression. Compression can further reduce the storage size. Additionally, Spatial Parquet can reduce the reading time by two orders of magnitude when the light-weight index is applied. This initial prototype can open new research directions to further improve geospatial data storage in column format. 
    more » « less