Sample Debiasing in the Themis Open World Database System

Orr, Laurel; Balazinska, Magdalena; Suciu, Dan

doi:10.1145/3318464.3380606

Citation Details

Sample Debiasing in the Themis Open World Database System

Open world database management systems assume tuples not in the database still exist and are becoming an increas- ingly important area of research. We present Themis, the first open world database that automatically rebalances ar- bitrarily biased samples to approximately answer queries as if they were issued over the entire population. We lever- age apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic mod- eling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining in- teractive query response times. We also show that Themis is robust to differences in the support between the sample and population, a key use case when using social media samples. more »

Award ID(s):: 1907997

PAR ID:: 10164636

Author(s) / Creator(s):: Orr, Laurel; Balazinska, Magdalena; Suciu, Dan

Date Published:: 2020-05-31

Journal Name:: SIGMOD

Page Range / eLocation ID:: 257 to 268

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3318464.3380606

More Like this