SchemaDrill: Interactive Semi-Structured Schema Design

Spoth, William; Xie, Ting; Kennedy, Oliver; Yang, Ying; Hammerschmidt, Beda; Liu, Zhen Hua; Gawlick, Dieter

doi:10.1145/3209900.3209908

Citation Details

SchemaDrill: Interactive Semi-Structured Schema Design

Ad-hoc data models like JSON make it easy to evolve schemas and to multiplex different data-types into a single stream. This flexibility makes JSON great for generating data, but also makes it much harder to query, ingest into a database, and index. In this paper, we explore the first step of JSON data loading: schema design. Specifically, we consider the challenge of designing schemas for existing JSON datasets as an interactive problem. We present SchemaDrill, a roll-up/drill-down style interface for exploring collections of JSON records. SchemaDrill helps users to visualize the collection, identify relevant fragments, and map it down into one or more flat, relational schemas. We describe and evaluate two key components of SchemaDrill: (1) A summary schema representation that significantly reduces the complexity of JSON schemas without a meaningful reduction in information content, and (2) A collection of schema visualizations that help users to qualitatively survey variability amongst different schemas in the collection. more »

Award ID(s):: 1640864 1750460

PAR ID:: 10061133

Author(s) / Creator(s):: Spoth, William; Xie, Ting; Kennedy, Oliver; Yang, Ying; Hammerschmidt, Beda; Liu, Zhen Hua; Gawlick, Dieter

Date Published:: 2018-01-01

Journal Name:: Proceedings of the Workshop on Human-In-the-Loop Data Analytics

Page Range / eLocation ID:: 1 to 7

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3209900.3209908

More Like this