DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup (DOI: 10.1109/ICDE60146.2024.00008)

Zhou, L; Candan, K; Zou, J

Citation Details

Storing tabular data to balance storage and query efficiency is a long-standing research question in the database community. In this work, we argue and show that a novel {\em DeepMapping} abstraction, which relies on the impressive {\em memorization} capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time. Such unique properties may benefit a broad class of use cases in capacity-limited devices. Our proposed DeepMapping abstraction transforms a dataset into multiple key-value mappings and constructs a multi-tasking neural network model that outputs the corresponding \textit{values} for a given input \textit{key}. To deal with memorization errors, DeepMapping couples the learned neural network with a lightweight auxiliary data structure capable of correcting mistakes. The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping. We propose a multi-task search strategy for selecting the hybrid DeepMapping structures (including model architecture and auxiliary structure) with a desirable trade-off among memorization capacity, size, and efficiency. Extensive experiments with a real-world dataset, synthetic and benchmark datasets, including TPC-H and TPC-DS, demonstrated that the DeepMapping approach can better balance the retrieving speed and compression ratio against several cutting-edge competitors. more »

Award ID(s):: 2144923

PAR ID:: 10514890

Author(s) / Creator(s):: Zhou, L; Candan, K; Zou, J

Publisher / Repository:: IEEE

Date Published:: 2024-05-15

Journal Name:: Proceedings of 2024 IEEE 39th International Conference on Data Engineering (ICDE 2024)

Page Range / eLocation ID:: 1-14

Format(s):: Medium: X

Location:: Utrecht Netherlands

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this