Search for: All records

Award ID contains: 2107107

« Prev Next »

Total Resources

13

Resource Type
Conference Paper

8

Conference Proceeding

0

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

11

Citation Only

2

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Overlay Spreadsheets

https://doi.org/10.1145/3597465.3605220

Kennedy, Oliver ; Glavic, Boris ; Brachmann, Michael ( June 2023 , HILDA '23: Proceedings of the Workshop on Human-In-the-Loop Data Analytics)

Free, publicly-accessible full text available June 18, 2024
Hybrid Query and Instance Explanations and Repairs

https://doi.org/10.1145/3543873.3587565

Lee, Seokki ; Glavic, Boris ; Chapman, Adriane ; Ludäscher, Bertram ( April 2023 , TaPP workshop - WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023)

Free, publicly-accessible full text available April 30, 2024
Efficient Approximation of Certain and Possible Answers for Ranking and Window Queries over Uncertain Data

https://doi.org/10.14778/3583140.3583151

Feng, Su ; Glavic, Boris ; Kennedy, Oliver ( February 2023 , Proceedings of the VLDB Endowment)

Uncertainty arises naturally in many application domains due to, e.g., data entry errors and ambiguity in data cleaning. Prior work in incomplete and probabilistic databases has investigated the semantics and efficient evaluation of ranking and top-k queries over uncertain data. However, most approaches deal with top-k and ranking in isolation and do represent uncertain input data and query results using separate, incompatible data models. We present an efficient approach for under- and over-approximating results of ranking, top-k, and window queries over uncertain data. Our approach integrates well with existing techniques for querying uncertain data, is efficient, and is to the best of our knowledge the first to support windowed aggregation. We design algorithms for physical operators for uncertain sorting and windowed aggregation, and implement them in PostgreSQL. We evaluated our approach on synthetic and real world datasets, demonstrating that it outperforms all competitors, and often produces more accurate results.
more » « less
Full Text Available
The Right Tool for the Job: Data-Centric Workflows in Vizier

Oliver Kennedy, Boris Glavic ( September 2022 , Bulletin of the Technical Committee on Data Engineering)
Sudeepa Roy and Jun Yang (Ed.)
Data scientists use a wide variety of systems with a wide variety of user interfaces such as spreadsheets and notebooks for their data exploration, discovery, preprocessing, and analysis tasks. While this wide selection of tools offers data scientists the freedom to pick the right tool for each task, each of these tools has limitations (e.g., the lack of reproducibility of notebooks), data needs to be translated between tool-specific formats, and common functionality such as versioning, provenance, and dealing with data errors often has to be implemented for each system. We argue that rather than alternating between task-specific tools, a superior approach is to build multiple user-interfaces on top of a single incremental workflow / dataflow platform with built-in support for versioning, provenance, error & tracking, and data cleaning. We discuss Vizier, a notebook system that implements this approach, introduce the challenges that arose in building such a system, and highlight how our work on Vizier lead to novel research in uncertain data management and incremental execution of workflows.
more » « less
Full Text Available
CaJaDE: explaining query results by augmenting provenance with context

https://doi.org/10.14778/3554821.3554852

Li, Chenjie ; Lee, Juseung ; Miao, Zhengjie ; Glavic, Boris ; Roy, Sudeepa ( August 2022 , Proceedings of the VLDB Endowment)

In this work, we demonstrate CaJaDE (Context-Aware Join-Augmented Deep Explanations), a system that explains query results by augmenting provenance with contextual information from other related tables in the database. Given two query results whose difference the user wants to understand, we enumerate possible ways of joining the provenance (i.e., contributing input tuples) of these two query results with tuples from other relevant tables in the database that were not used in the query. We use patterns to concisely explain the difference between the augmented provenance of the two query results. CaJaDE, through a comprehensive UI, enables the user to formulate questions and explore explanations interactively.
more » « less
Full Text Available
Runtime provenance refinement for notebooks

https://doi.org/10.1145/3530800.3534535

Deo, Nachiket ; Glavic, Boris ; Kennedy, Oliver ( June 2022 , Proceedings of the 14th International Workshop on the Theory and Practice of Provenance)

Full Text Available
Generating Interpretable Data-Based Explanations for Fairness Debugging using Gopher

https://doi.org/10.1145/3514221.3520170

Zhu, Jiongli ; Pradhan, Romila ; Glavic, Boris ; Salimi, Babak ( June 2022 , ACM SIGMOD)

Full Text Available
Efficient Answering of Historical What-if Queries

https://doi.org/10.1145/3514221.3526138

Campbell, Felix S. ; Arab, Bahareh Sadat ; Glavic, Boris ( June 2022 , ACM SIGMOD)

Full Text Available
Interpretable Data-Based Explanations for Fairness Debugging

https://doi.org/10.1145/3514221.3517886

Pradhan, Romila ; Zhu, Jiongli ; Glavic, Boris ; Salimi, Babak ( June 2022 , ACM SIGMOD)

Full Text Available
Provenance-based data skipping

https://doi.org/10.14778/3494124.3494130

Niu, Xing ; Glavic, Boris ; Liu, Ziyu ; Li, Pengyuan ; Gawlick, Dieter ; Krishnaswamy, Vasudha ; Liu, Zhen Hua ; Porobic, Danica ( November 2021 , Proceedings of the VLDB Endowment)

Database systems use static analysis to determine upfront which data is needed for answering a query and use indexes and other physical design techniques to speed-up access to that data. However, for important classes of queries, e.g., HAVING and top-k queries, it is impossible to determine up-front what data is relevant. To overcome this limitation, we develop provenance-based data skipping (PBDS), a novel approach that generates provenance sketches to concisely encode what data is relevant for a query. Once a provenance sketch has been captured it is used to speed up subsequent queries. PBDS can exploit physical design artifacts such as indexes and zone maps.
more » « less
Full Text Available

« Prev Next »