NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A new window Clause for SQL++

https://doi.org/10.1007/s00778-023-00830-z

Fang, James; Lychagin, Dmitry; Carey, Michael J.; Tsotras, Vassilis J. (December 2023, The VLDB Journal)

Abstract Window queries are important analytical tools for ordered data and have been researched both in streaming and stored data environments. By incorporating ideas for window queries from existing streaming and stored data systems, we propose a new window syntax that makes a wide range of window queries easier to write and optimize. We have implemented this new window syntax in SQL++, an SQL extension that supports querying semistructured data, on top of AsterixDB, a Big Data Management System, thus allowing us to process window queries over large datasets in a parallel and efficient manner.
more » « less
Full Text Available
Memory Management in Complex Join Queries: A Re-evaluation Study

https://doi.org/10.1145/3698038.3698565

Jahangiri, Shiva; Carey, Michael J; Freytag, Johann-Christoph (November 2024, ACM)

Efficient multi-join query processing is crucial but remains a complex, ongoing challenge for high-performance data management systems (DBMSs). This paper studies the impact of different memory distribution techniques among join operators on different classes of multi-join query plans under different assumptions regarding memory availability and storage devices such as HDD and SSD on Amazon Web Services (AWS). We re-evaluate the results of one of the early impactful studies from the 1990s that was originally done using a simulator for the Gamma database system. The main goal of our study is to scientifically re-evaluate and build upon previous studies whose results have become the basis for the design of past and modern database systems, and to provide a solid foundation for understanding basic "join physics", which is essential for eventually designing a resource-based scheduler for concurrent complex workloads.
more » « less
Full Text Available
Memory Management in Complex Join Queries: A Re-evaluation Study

Jahangiri, S; Carey, M; Freytag, C (November 2024, 2024 ACM Symposium on Cloud Computing (SoCC'24))

Efficient multi-join query processing is crucial but remains a com- plex, ongoing challenge for high-performance data management systems (DBMSs). This paper studies the impact of different memory distribution techniques among join operators on different classes of multi-join query plans under different assumptions regarding memory availability and storage devices such as HDD and SSD on Amazon Web Services (AWS). We re-evaluate the results of one of the early impactful studies from the 1990s that was originally done using a simulator for the Gamma database system. The main goal of our study is to scientifically re-evaluate and build upon previous studies whose results have become the basis for the design of past and modern database systems, and to provide a solid foundation for understanding basic “join physics", which is essential for eventually designing a resource-based scheduler for concurrent complex workloads.
more » « less
Full Text Available
FUDJ: Flexible User-Defined Distributed Joins

https://doi.org/10.1109/ICDE60146.2024.00320

Sevim, Akil; Eldawy, Ahmed; Carman, E Preston; Carey, Michael J; Tsotras, Vassilis J (May 2024, IEEE)

Join operations are crucial in data analysis, but can suffer inefficiency with large datasets and complex non-equality-based conditions. Optimized join algorithms have gained traction in database research to address these challenges. One popular choice for implementing join algorithms is distributed data processing frameworks, e.g., Hadoop and Spark, but each implementation is highly tailored for specific query types. As a result, they do not address join queries that involve diverse and complex conditions since they are not integrated into a holistic query optimization engine like in DBMSs. On the other hand, implementing new join algorithms on a DBMS from scratch requires substantial effort and expertise. This paper introduces FUDJ, Flexible User-defined Distributed Joins, a framework for complex distributed join algorithms. The key idea of FUDJ is to allow developers to realize new distributed join algorithms into the database without delving into the database internals. As shown, an algorithm implemented in FUDJ is up to an order of magnitude faster than existing user-defined implementations with an order of magnitude fewer lines of code.
more » « less
Full Text Available
Graphix: “One User's JSON is Another User's Graph”

https://doi.org/10.1109/ICDE60146.2024.00238

Galvizo, Glenn; Carey, Michael J (May 2024, 2024 IEEE 40th International Conference on Data Engineering (ICDE))

The increasing prevalence of large graph data has produced a variety of research and applications tailored toward graph data management. Users aiming to perform graph analytics will typically start by importing existing data into a separate graph-purposed storage engine. The cost of maintaining a separate system (e.g., the data copy, the associated queries, etc …) just for graph analytics may be prohibitive for users with Big Data. In this paper, we introduce Graphix and show how it enables property graph views of existing document data in AsterixDB, a Big Data management system boasting a partitioned-parallel query execution engine. We explain a) the graph view user model of Graphix, b) gSQL++ , a novel query language extension for synergistic document-based navigational pattern matching, and c) how edge hops are evaluated in a parallel fashion. We then compare queries authored in gSQL++ against versions in other leading query languages. Finally, we evaluate our approach against a leading native graph database, Neo4j, and show that Graphix is appropriate for operational and analytical workloads, especially at scale.
more » « less
Full Text Available
SQL++: We Can Finally Relax!

https://doi.org/10.1109/ICDE60146.2024.00438

Carey, Michael; Chamberlin, Don; Goo, Almann; Ong, Kian Win; Papakonstantinou, Yannis; Suver, Chris; Vemulapalli, Sitaram; Westmann, Till (May 2024, 2024 IEEE 40th International Conference on Data Engineering (ICDE))

SQL is five decades old and has outlasted many programming and query languages that have come and gone during its lifetime. It was born shortly after the introduction of the relational model, and was designed for querying a flat and typed tabular world. Support for modern, flexible data in the SQL standard and in relational database systems has largely been approached via the addition of new column types (e.g. XML or JSON) together with functions to operate on them. It is time for a cleaner solution that retains the benefits that have allowed SQL to be so successful for so long. We describe SQL++, a SQL extension that relaxes SQL's strictness in terms of both object structure (flat → nested) and schema (mandatory → optional), along with a multi-party effort to agree on a core definition and syntax supportable by multiple vendors. SQL++ sees relational data as a subset of a more flexible object model and it sees collections of document data (e.g., JSON) as a natural and supportable relaxation as opposed to a “bolt on” addition via a SQL column type. We describe the core features of SQL++ and explain how its definition can accommodate flexible data, while staying true to SQL in situations where the target data is tabular and strongly typed. Index Terms-semistructured data, query, JSON, SQL, NoSQL
more » « less
Full Text Available
Towards a Memory-Adaptive Hybrid Hash Join Design

https://doi.org/10.1109/BigData59044.2023.10386098

Siviero, Giulliano_Silva Zanotti; Jahangiri, Shiva (December 2023, 2023 IEEE International Conference on Big Data (BigData))

In database management systems (DBMSs) that handle multiple concurrent queries, adapting to fluctuating workloads is crucial. This flexibility allows the DBMS to revise decisions based on current workload and available resources. As memory availability changes with the arrival or completion of queries, having memory-intensive operators like the Hybrid Hash Join that dynamically adapt is vital. This paper introduces a new memory-adaptive Hash-Based join algorithm design implemented in Apache AsterixDB and evaluates its responsiveness to memory variability.
more » « less
Full Text Available
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

https://doi.org/10.1145/3604437.3604460

Pavlopoulou, Christina; Carey, Michael J.; Tsotras, Vassilis J. (June 2023, ACM SIGMOD Record)

Effective query optimization remains an open problem for Big Data Management Systems. In this work, we revisit an old idea, runtime dynamic optimization, and adapt it to a big data management system, AsterixDB. The approach runs in stages (re-optimization points), starting by first executing all predicates local to a single dataset. The intermediate result created by a stage is then used to re-optimize the remaining query. This re-optimization approach avoids inaccurate intermediate result cardinality estimates, thus leading to much better execution plans. While it introduces overhead for materializing intermediate results, experiments show that this overhead is relatively small and is an acceptable price to pay given the optimization benefits.
more » « less
Full Text Available
CH3: A Mixed Workload Benchmark for Scalable NoSQL

https://doi.org/10.1109/BigData55660.2022.10021092

Mahin, Mehnaz Tabassum; Wang, Bo-Chun; Jagtiani, Kamini; Carey, Michael; Murthy, Keshav (December 2022, IEEE)

In Proc. of the IEEE Int’l. Workshop on Benchmarking, Performance Tuning, and Optimization for Big Data Applications (BPOD 2022)
more » « less
Full Text Available
JEDI: These aren't the JSON documents you're looking for...

https://doi.org/10.1145/3514221.3517850

Hütter, Thomas; Augsten, Nikolaus; Kirsch, Christoph M.; Carey, Michael J.; Li, Chen (June 2022, Proc. ACM SIGMOD Conf.)

Full Text Available

« Prev Next »

Search for: All records