An Architecture for Cell-Centric Indexing of Datasets

Qiu, Lixuan; Jia, Haiyan; Davison, Brian D.; Heflin, Jeff

Citation Details

ncreasingly, large collections of datasets are made available to the public via the Web, ranging from government-curated datasets like those of data.gov to communally-sourced datasets such as Wikipedia tables. It has become clear that traditional search techniques are insufficient for such sources, especially when the user is unfamiliar with the terminology used by the creators of the relevant datasets. We propose to address this problem by elevating the datum to a first-class object that is indexed, thereby making it less dependent on how a dataset is structured. In a data table, a cell contains a value for a particular row as described by a particular column. In our cell-centric indexing approach, we index the metadata of each cell, so that information about its dataset and column simply become metadata rather than constraining concepts. In this paper we define cell-centric indexing and present a system architecture that supports its use in exploring datasets. We describe how cell-centric indexing can be implemented using traditional information retrieval technology and evaluate the scalability of the architecture. more »

Award ID(s):: 1816325

PAR ID:: 10254054

Author(s) / Creator(s):: Qiu, Lixuan; Jia, Haiyan; Davison, Brian D.; Heflin, Jeff

Date Published:: 2020-12-03

Journal Name:: CEUR workshop proceedings

Volume:: 2722

ISSN:: 1613-0073

Page Range / eLocation ID:: 82-96

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this