Search for: All records

Award ID contains: 1956096

« Prev Next »

Total Resources

14

Resource Type
Conference Paper

6

Conference Proceeding

0

Dataset

0

Journal Article

8

Workshop Report

0

Availability
Full Text / Resource Available

10

Citation Only

4

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Computation of Quantiles over Joins

https://doi.org/10.1145/3584372.3588670

Tziavelis, Nikolaos ; Carmeli, Nofar ; Gatterbauer, Wolfgang ; Kimelfeld, Benny ; Riedewald, Mirek ( June 2023 , PODS)

Free, publicly-accessible full text available June 18, 2024
DIALITE: Discover, Align and Integrate Open Data Tables

https://doi.org/10.1145/3555041.3589732

Khatiwada, Aamod ; Shraga, Roee ; Miller, Renée J. ( June 2023 , ACM SIGMOD)

Free, publicly-accessible full text available June 4, 2024
SANTOS: Relationship-based Semantic Table Union Search

https://doi.org/10.1145/3588689

Khatiwada, Aamod ; Fan, Grace ; Shraga, Roee ; Chen, Zixuan ; Gatterbauer, Wolfgang ; Miller, Renée J. ; Riedewald, Mirek ( May 2023 , Proceedings of the ACM on Management of Data)

Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) or column-based metrics (for example, the values in a table should be drawn from the same domain). In this work, we introduce the use of semantic relationships between pairs of columns in a table to improve the accuracy of the union search. Consequently, we introduce a new notion of unionability that considers relationships between columns, together with the semantics of columns, in a principled way. To do so, we present two new methods to discover the semantic relationships between pairs of columns. The first uses an existing knowledge base (KB), and the second (which we call a "synthesized KB") uses knowledge from the data lake itself. We adopt an existing Table Union Search benchmark and present new (open) benchmarks that represent small and large real data lakes. We show that our new unionability search algorithm, called SANTOS, outperforms a state-of-the-art union search that uses a wide variety of column-based semantics, including word embeddings and regular expressions. We show empirically that our synthesized KB improves the accuracy of union search by representing relationship semantics that may not be contained in an available KB. This result hints at a promising future of creating synthesized KBs from data lakes with limited KB coverage and using them for union search.
more » « less
Free, publicly-accessible full text available May 26, 2024
Why Not Yet: Fixing a Top-k Ranking that is Not Fair to Individuals

https://doi.org/10.14778/3598581.3598606

Zixuan Chen ; Panagiotis Manolios ; Mirek Riedewald ( May 2023 , Proceedings of the VLDB Endowment)

This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair to certain individuals or organizations, triggering the question why some entity of interest has not yet shown up in the top-k. We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest. While some why-not-yet problems have linear constraints, others require quantifiers, disjunction, and negation. We propose several optimizations, ranging from a monotonic-core construction that approximates the complex constraints with a conjunction of linear ones, to various techniques that let the user control the tradeoff between running time and approximation quality. Experiments with real and synthetic data demonstrate the practicality and scalability of our technique, showing its superiority compared to the state of the art (SOA).
more » « less
Free, publicly-accessible full text available May 1, 2024
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

https://doi.org/10.1145/3578517

Carmeli, Nofar ; Tziavelis, Nikolaos ; Gatterbauer, Wolfgang ; Kimelfeld, Benny ; Riedewald, Mirek ( March 2023 , ACM Transactions on Database Systems)

We study the question of when we can provide direct access to the k-th answer to a Conjunctive Query (CQ) according to a specified order over the answers in time logarithmic in the size of the database, following a preprocessing step that constructs a data structure in time quasilinear in database size. Specifically, we embark on the challenge of identifying the tractable answer orderings , that is, those orders that allow for such complexity guarantees. To better understand the computational challenge at hand, we also investigate the more modest task of providing access to only a single answer (i.e., finding the answer at a given position), a task that we refer to as the selection problem , and ask when it can be performed in quasilinear time. We also explore the question of when selection is indeed easier than ranked direct access. We begin with lexicographic orders . For each of the two problems, we give a decidable characterization (under conventional complexity assumptions) of the class of tractable lexicographic orders for every CQ without self-joins. We then continue to the more general orders by the sum of attribute weights and establish the corresponding decidable characterizations, for each of the two problems, of the tractable CQs without self-joins. Finally, we explore the question of when the satisfaction of Functional Dependencies (FDs) can be utilized for tractability and establish the corresponding generalizations of our characterizations for every set of unary FDs.
more » « less
Full Text Available
Integrating Data Lake Tables

https://doi.org/10.14778/3574245.3574274

Khatiwada, Aamod ; Shraga, Roee ; Gatterbauer, Wolfgang ; Miller, Renée J. ( December 2022 , Proceedings of the VLDB Endowment)

We have made tremendous strides in providing tools for data scientists to discover new tables useful for their analyses. But despite these advances, the proper integration of discovered tables has been under-explored. An interesting semantics for integration, called Full Disjunction, was proposed in the 1980's, but there has been little progress in using it for data science to integrate tables culled from data lakes. We provide ALITE, the first proposal for scalable integration of tables that may have been discovered using join, union or related table search. We empirically show that ALITE can outperform previous algorithms for computing the Full Disjunction. ALITE relaxes previous assumptions that tables share common attribute names (which completely determine the join columns), are complete (without null values), and have acyclic join patterns. To evaluate ALITE, we develop and share three new benchmarks for integration that use real data lake tables.
more » « less
Full Text Available
Toward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming

https://doi.org/10.1109/ICDE53745.2022.00299

Tziavelis, Nikolaos ; Gatterbauer, Wolfgang ; Riedewald, Mirek ( May 2022 , ICDE tutorials)

Full Text Available
Fair Top-k Ranking with multiple protected groups

https://doi.org/10.1016/j.ipm.2021.102707

Zehlike, Meike ; Sühr, Tom ; Baeza-Yates, Ricardo ; Bonchi, Francesco ; Castillo, Carlos ; Hajian, Sara ( January 2022 , Information Processing & Management)

Full Text Available
STRATISFIMAL LAYOUT: A modular optimization model for laying out layered node-link network visualizations

https://doi.org/10.1109/TVCG.2021.3114756

di Bartolomeo, Sara ; Riedewald, Mirek ; Gatterbauer, Wolfgang ; Dunne, Cody ( January 2022 , IEEE Transactions on Visualization and Computer Graphics)

Full Text Available
Beyond equi-joins: ranking, enumeration and factorization

https://doi.org/10.14778/3476249.3476306

Tziavelis, Nikolaos ; Gatterbauer, Wolfgang ; Riedewald, Mirek ( July 2021 , Proceedings of the VLDB Endowment)

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k , the k top-ranked answers are returned in O ( n polylog n + k log k ) time. This is within a polylogarithmic factor of O ( n + k log k ), i.e., the best known complexity for equi-joins, and even of O ( n + k ), i.e., the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections (namely those called "free-connex" queries and those that use bag semantics). Remarkably, they hold even when the number of join results is n ℓ for a join of ℓ relations. The key ingredient is a novel O ( n polylog n )-size factorized representation of the query output , which is constructed on-the-fly for a given query and database. In addition to providing the first nontrivial theoretical guarantees beyond equi-joins, we show in an experimental study that our ranked-enumeration approach is also memory-efficient and fast in practice, beating the running time of state-of-the-art database systems by orders of magnitude.
more » « less
Full Text Available

« Prev Next »