NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Space-Time Tradeoffs for Conjunctive Queries with Access Patterns

https://doi.org/10.1145/3584372.3588675

Zhao, Hangdong; Deep, Shaleen; Koutris, Paraschos (June 2023, Symposium on Principles of Database Systems (PODS))

Full Text Available
General Space-Time Tradeoffs via Relational Queries

Deep, Shaleen; Hu, Xiao; Koutris, Paraschos (January 2023, Lecture Notes in Computer Science (WADS))

Full Text Available
Ranked enumeration of join queries with projections

https://doi.org/10.14778/3510397.3510401

Deep, Shaleen; Hu, Xiao; Koutris, Paraschos (January 2022, Proceedings of the VLDB Endowment)

Join query evaluation with ordering is a fundamental data processing task in relational database management systems. SQL and custom graph query languages such as Cypher offer this functionality by allowing users to specify the order via the ORDER BY clause. In many scenarios, the users also want to see the first k results quickly (expressed by the LIMIT clause), but the value of k is not predetermined as user queries are arriving in an online fashion. Recent work has made considerable progress in identifying optimal algorithms for ranked enumeration of join queries that do not contain any projections. In this paper, we initiate the study of the problem of enumerating results in ranked order for queries with projections. Our main result shows that for any acyclic query, it is possible to obtain a near-linear (in the size of the database) delay algorithm after only a linear time preprocessing step for two important ranking functions: sum and lexicographic ordering. For a practical subset of acyclic queries known as star queries, we show an even stronger result that allows a user to obtain a smooth tradeoff between faster answering time guarantees using more preprocessing time. Our results are also extensible to queries containing cycles and unions. We also perform a comprehensive experimental evaluation to demonstrate that our algorithms, which are simple to implement, improve up to three orders of magnitude in the running time over state-of-the-art algorithms implemented within open-source RDBMS and specialized graph databases.
more » « less
Full Text Available
Algorithms for a Topology-aware Massively Parallel Computation Model

https://doi.org/10.1145/3452021.3458318

Hu, Xiao; Koutris, Paraschos; Blanas, Spyros (June 2021, PODS)
null (Ed.)
Full Text Available
Ranked Enumeration of Conjunctive Query Results

https://doi.org/10.4230/LIPIcs.ICDT.2021.5

Deep, Shaleen; Koutris, Paraschos (January 2021, ICDT)
null (Ed.)
Full Text Available
Enumeration Algorithms for Conjunctive Queries with Projection

https://doi.org/10.4230/LIPIcs.ICDT.2021.14

Deep, Shaleen; Hu, Xiao; Koutris, Paraschos (January 2021, ICDT)
null (Ed.)
Full Text Available
Locality-Aware Distribution Schemes

https://doi.org/10.4230/LIPIcs.ICDT.2021.22

Sundarmurthy, Bruhathi; Koutris, Paraschos; Naughton, Jeffrey (January 2021, ICDT)
null (Ed.)
Full Text Available
Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

Blanas, Spyros; Koutris, Paraschos; Sidiropoulos, Anastasios (January 2020, 10th Annual Conference on Innovative Data Systems Research (CIDR ‘20))

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical benefit. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts.
more » « less
Full Text Available
Fast Join Project Query Evaluation using Matrix Multiplication

https://doi.org/10.1145/3318464.3380607

Deep, Shaleen; Hu, Xiao; Koutris, Paraschos (January 2020, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data)

In the last few years, much effort has been devoted to developing join algorithms to achieve worst-case optimality for join queries over relational databases. Towards this end, the database community has had considerable success in developing efficient algorithms that achieve worst-case optimal runtime for full join queries, i.e., joins without projections. However, not much is known about join evaluation with projections beyond some simple techniques of pushing down the projection operator in the query execution plan. Such queries have a large number of applications in entity matching, graph analytics and searching over compressed graphs. In this paper, we study how a class of join queries with projections can be evaluated faster using worst-case optimal algorithms together with matrix multiplication. Crucially, our algorithms are parameterized by the output size of the final result, allowing for choosing the best execution strategy. We implement our algorithms as a subroutine and compare the performance with state-of-the-art techniques to show they can be improved upon by as much as 50x. More importantly, our experiments indicate that matrix multiplication is a useful operation that can help speed up join processing owing to highly optimized open-source libraries that are also highly parallelizable.
more » « less
Full Text Available
Topology-aware Parallel Data Processing: Models, Algorithms and Systems at Scale

Blanas, Spyros; Koutris, Paraschos; Sidiropoulos, Anastasios (January 2020, 10th Conference on Innovative Data Systems Research (CIDR 2020))

The analysis of massive datasets requires a large number of processors. Prior research has largely assumed that tracking the actual data distribution and the underlying network structure of a cluster, which we collectively refer to as the topology, comes with a high cost and has little practical benefit. As a result, theoretical models, algorithms and systems often assume a uniform topology; however this assumption rarely holds in practice. This necessitates an end-to-end investigation of how one can model, design and deploy topology-aware algorithms for fundamental data processing tasks at large scale. To achieve this goal, we first develop a theoretical parallel model that can jointly capture the cost of computation and communication. Using this model, we explore algorithms with theoretical guarantees for three basic tasks: aggregation, join, and sorting. Finally, we consider the practical aspects of implementing topology-aware algorithms at scale, and show that they have the potential to be orders of magnitude faster than their topology-oblivious counterparts.
more » « less
Full Text Available

« Prev Next »

Search for: All records