Reducing Ambiguity in Json Schema Discovery

Spoth, William; Kennedy, Oliver; Lu, Ying; Hammerschmidt, Beda; Liu, Zhen Hua

doi:10.1145/3448016.3452801

Citation Details

Reducing Ambiguity in Json Schema Discovery

Ad-hoc data models like Json simplify schema evolution and enable multiplexing various data sources into a single stream. While useful when writing data, this flexibility makes Json harder to validate and query, forcing such tasks to rely on automated schema discovery techniques. Unfortunately, ambiguity in the schema design space forces existing schema discovery systems to make simplifying, data-independent assumptions about schema structure. When these assumptions are violated, most notably by APIs, the generated schemas are imprecise, creating numerous opportunities for false positives during validation. In this paper, we propose Jxplain, a Json schema discovery algorithm with heuristics that mitigate common forms of ambiguity. Although Jxplain is slightly slower than state of the art schema extractors, we show that it produces significantly more precise schemas. more »

Award ID(s):: 1750460 1640864

PAR ID:: 10274665

Author(s) / Creator(s):: Spoth, William; Kennedy, Oliver; Lu, Ying; Hammerschmidt, Beda; Liu, Zhen Hua

Date Published:: 2021-06-09

Journal Name:: SIGMOD '21: International Conference on Management of Data

Page Range / eLocation ID:: 1732 to 1744

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3448016.3452801

More Like this