NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A new window Clause for SQL++

https://doi.org/10.1007/s00778-023-00830-z

Fang, James; Lychagin, Dmitry; Carey, Michael J.; Tsotras, Vassilis J. (December 2023, The VLDB Journal)

Abstract Window queries are important analytical tools for ordered data and have been researched both in streaming and stored data environments. By incorporating ideas for window queries from existing streaming and stored data systems, we propose a new window syntax that makes a wide range of window queries easier to write and optimize. We have implemented this new window syntax in SQL++, an SQL extension that supports querying semistructured data, on top of AsterixDB, a Big Data Management System, thus allowing us to process window queries over large datasets in a parallel and efficient manner.
more » « less
Subscribing to big data at scale

https://doi.org/10.1007/s10619-022-07406-w

Wang, Xikui; Carey, Michael J.; Tsotras, Vassilis J. (April 2022, Distributed and Parallel Databases)

Abstract Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system.
more » « less
Robust and efficient memory management in Apache AsterixDB

https://doi.org/10.1002/spe.2799

Kim, Taewoo; Behm, Alexander; Blow, Michael; Borkar, Vinayak; Bu, Yingyi; Carey, Michael J.; Hubail, Murtadha; Jahangiri, Shiva; Jia, Jianfeng; Li, Chen; et al (February 2020, Software: Practice and Experience)

Summary Traditional relational database systems handle data by dividing their memory into sections such as a buffer cache and working memory, assigning a memory budget to each section to efficiently manage a limited amount of overall memory. They also assign memory budgets to memory‐intensive operators such as sorts and joins and control the allocation of memory to these operators; each memory‐intensive operator attempts to maximize its memory usage to reduce disk I/O cost. Implementing such memory‐intensive operators requires a careful design and application of appropriate algorithms that properly utilize memory. Today's Big Data management systems need the ability to handle large amounts of data similarly, as it is unrealistic to assume that truly big data will fit into memory. In this article, we share our memory management experiences in Apache AsterixDB, an open‐source Big Data management software platform that scales out horizontally on shared‐nothing commodity computing clusters. We describe the implementation of AsterixDB's memory‐intensive operators and their designs related to memory management. We also discuss memory management at the global (cluster) level. We conducted an experimental study using several synthetic and real datasets to explore the impact of this work. We believe that future Big Data management system builders can benefit from these experiences.
more » « less
Multi-valued indexing in Apache AsterixDB (SI DOLAP 2022)

https://doi.org/10.1016/j.is.2022.102144

Galvizo, Glenn; Carey, Michael J. (January 2023, Information Systems)

Full Text Available
JEDI: These aren't the JSON documents you're looking for...

https://doi.org/10.1145/3514221.3517850

Hütter, Thomas; Augsten, Nikolaus; Kirsch, Christoph M.; Carey, Michael J.; Li, Chen (June 2022, Proc. ACM SIGMOD Conf.)

Full Text Available
DynaHash: Efficient Data Rebalancing in Apache AsterixDB

https://doi.org/10.1109/icde53745.2022.00041

Luo, Chen; Carey, Michael J. (May 2022, Proc. ICDE Conf.)

Full Text Available
Revisiting Runtime Dynamic Optimization for Join Queries in Big Data Management Systems

https://doi.org/10.5441/002/edbt.2022.01

Pavlopoulou, C.; Carey, M.; Tsotras, V. (March 2022, Proc. EDBT Conf.)

Full Text Available
Multi-Valued Indexing in AsterixDB

Galviso, G.; Carey, M. (March 2022, Proc. Int’l. Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP))

Full Text Available
CH2: A Hybrid Operational/Analytical Processing Benchmark for NoSQL

https://doi.org/10.1007/978-3-030-94437-7_5

Carey, M.; Lychagin, D.; Muralikrishna, M.; Sarawathy, V; Westmann, T. (January 2022, Proc. 13th TPC Technology Conf. on Performance Evaluation & Benchmarking (TPC TC))
Nambiar, R; Poess, M. (Ed.)
Database systems with hybrid data management support, referred to as HTAP or HOAP architectures, are gaining popularity. These first appeared in the relational world, and the CH-benCHmark (CH) was proposed in 2011 to evaluate such relational systems. Today, one finds NoSQL database systems gaining adoption for new applications. In this paper we present CH2, a new benchmark – created with CH as its starting point – aimed at evaluating hybrid data platforms in the document data management world. Like CH, CH2 borrows from and extends both TPC-C and TPC-H. Differences from CH include a document-oriented schema, a data generation scheme that creates a TPC-H-like history, and a “do over” of the CH queries that is more in line with TPC-H. This paper details shortcomings that we uncovered in CH, the design of CH2, and preliminary results from running CH2 against Couchbase Server 7.0 (whose Query and Analytics services provide HOAP support for NoSQL data). The results provide insight into the performance isolation and horizontal scalability properties of Couchbase Server 7.0 as well as demonstrating the efficacy of CH2 for evaluating such platforms.
more » « less
Full Text Available
Design Trade-offs for a Robust Dynamic Hybrid Hash Join

https://doi.org/10.14778/3547305.3547327

Jahangiri, S.; Carey, M.; Freytag, C. (January 2022, Proceedings of the VLDB Endowment)

Full Text Available

« Prev Next »

Search for: All records