NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HoneyComb: A Parallel Worst-Case Optimal Join on Multicores

https://doi.org/10.1145/3725307

Wu, Jiacheng; Suciu, Dan (June 2025, Proceedings of the ACM on Management of Data)

To achieve true scalability on massive datasets, a modern query engine needs to be able to take advantage of large, shared-memory, multicore systems.Binary joinsare conceptually easy to parallelize on a multicore system; however, several applications require a different approach to query evaluation, using a Worst-Case Optimal Join (WCOJ) algorithm. WCOJ is known to outperform traditional query plans for cyclic queries. However, there is no obvious adaptation of WCOJ to parallel architectures. The few existing systems that parallelizeWCOJ do this by partitioning only the top variable of theWCOJ algorithm. This leads to work skew (since some relations end up being read entirely by every thread), possible contention between threads (when the hierarchical trie index is built lazily, which is the case on most recent WCOJ systems), and exacerbates the redundant computations already existing in WCOJ.
more » « less
Free, publicly-accessible full text available June 17, 2026
Templating Shuffles

Zhang, Qizhen; Wu, Jiacheng; Chen, Ang; Liu, Vincent; Loo, Boon Thau (January 2023, Conference on Innovative Data Systems Research)

Cloud data centers are evolving fast. At the same time, today’s large-scale data analytics applications require non-trivial performance tuning that is often specific to the applications, workloads, and data center infrastructure. We propose TeShu, which makes network shuffling an extensible unified service layer common to all data analytics. Since an optimal shuffle depends on a myriad of factors, TeShu introduces parameterized shuffle templates, instantiated by accurate and efficient sampling that enables TeShu to dynamically adapt to different application workloads and data center layouts. Our preliminary experimental results show that TeShu efficiently enables shuffling optimizations that improve performance and adapt to a variety of data center network scenarios.
more » « less
Full Text Available
Templating Shuffles

Zhang, Qizhen; Wu, Jiacheng; Chen, Ang; Liu, Vincent; Loo, Boon Thau (January 2023, CIDR)

Full Text Available

Search for: All records