Changes get_author_records function definition to make the token optional and switches to using the full date. Adds author affiliation automation for CaltechAUTHORS. Added new function for updating DOIs in CaltechAUTHORS. Adds new function to get series from CaltechAUTHORS. Re-enables DOI validation on datacite harvester. Adds backend functions for automated crosslinking for CaltechDATA and CaltechAUTHORS.
more »
« less
caltechdata_api – Addition of test mode and initial metadata validation v1.8.0
This release adds a -test flag to the cli, and adds some new metadata validation checks.
more »
« less
- Award ID(s):
- 2322420
- PAR ID:
- 10617401
- Publisher / Repository:
- CaltechDATA
- Date Published:
- Subject(s) / Keyword(s):
- GitHub IGA InvenioRDM metadata Python software
- Format(s):
- Medium: X
- Right(s):
- BSD 3 Clause
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Modern data analytics applications prefer to use column-storage formats due to their improved storage efficiency through encoding and compression. Parquet is the most popular file format for col- umn data storage that provides several of these benefits out of the box. However, geospatial data is not readily supported by Parquet. This paper introduces Spatial Parquet, a Parquet extension that efficiently supports geospatial data. Spatial Parquet inherits all the advantages of Parquet for non-spatial data, such as rich data types, compression, and column/row filtering. Additionally, it adds three new features to accommodate geospatial data. First, it introduces a geospatial data type that can encode all standard spatial geome- tries in a column format compatible with Parquet. Second, it adds a new lossless and efficient encoding method, termed FP-delta, that is customized to efficiently store geospatial coordinates stored in floating-point format. Third, it adds a light-weight spatial index that allows the reader to skip non-relevant parts of the file for increased read efficiency. Experiments on large-scale real data showed that Spatial Parquet can reduce the data size by a factor of three even without compression. Compression can further reduce the storage size. Additionally, Spatial Parquet can reduce the reading time by two orders of magnitude when the light-weight index is applied. This initial prototype can open new research directions to further improve geospatial data storage in column format.more » « less
An official website of the United States government
