To Not Miss the Forest for the Trees - A Holistic Approach for Explaining Missing Answers over Nested Data

Diestelkämper, Ralf; Lee, Seokki; Herschel, Melanie; Glavic, Boris

doi:10.1145/3448016.3457249

Citation Details

To Not Miss the Forest for the Trees - A Holistic Approach for Explaining Missing Answers over Nested Data

Query-based explanations for missing answers identify which operators of a query are responsible for the failure to return a missing answer of interest. This type of explanations has proven useful, e.g., to debug complex analytical queries. Such queries are frequent in big data systems such as Apache Spark. We present a novel approach to produce query-based explanations. It is the first to support nested data and to consider operators that modify the schema and structure of the data (e.g., nesting, projections) as potential causes of missing answers. To efficiently compute explanations, we propose a heuristic algorithm that applies two novel techniques: (i) reasoning about multiple schema alternatives for a query and (ii) re-validating at each step whether an intermediate result can contribute to the missing answer. Using an implementation on Spark, we demonstrate that our approach is the first to scale to large datasets while often finding explanations that existing techniques fail to identify. more »

Award ID(s):: 1640864 1956123

PAR ID:: 10278464

Author(s) / Creator(s):: Diestelkämper, Ralf; Lee, Seokki; Herschel, Melanie; Glavic, Boris

Date Published:: 2021-07-01

Journal Name:: Proceedings of the 46th International Conference on Management of Data

Page Range / eLocation ID:: 405–417

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3448016.3457249

More Like this