NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimizing Big Active Data Management Systems

Shirazi, Shahrzad; Wang, Xikui; Carey, Michael; Tsotras, Vassilis (March 2025, Proceedings of the 27th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP 2025) co-located with the 28th International Conference on Extending Database Technology and the 28th International Conference on Database Theory (EDBT/ICDT 2025), Barcelona, Spain, March 25, 2025.)

Within the dynamic world of Big Data, traditional systems typically operate in a passive mode, processing and responding to user queries by returning the requested data. However, this methodology falls short of meeting the evolving demands of users who not only wish to analyze data but also to receive proactive updates on topics of interest. To bridge this gap, Big Active Data (BAD) frameworks have been proposed to support extensive data subscriptions and analytics for millions of subscribers. As data volumes and the number of interested users continue to increase, it is imperative to optimize BAD systems for enhanced scalability, performance, and efficiency. To this end, this paper introduces three main optimizations, namely: strategic aggregation, intelligent modifications to the query plan, and early result filtering, all aimed at reinforcing a BAD platform’s capability to actively manage and efficiently process soaring rates of incoming data and distribute notifications to larger numbers of subscribers.
more » « less
Free, publicly-accessible full text available March 25, 2026
Subscribing to big data at scale

https://doi.org/10.1007/s10619-022-07406-w

Wang, Xikui; Carey, Michael J.; Tsotras, Vassilis J. (April 2022, Distributed and Parallel Databases)

Abstract Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system.
more » « less
Bridging BAD Islands: Declarative Data Sharing at Scale

https://doi.org/10.1109/BigData50022.2020.9378342

Wang, Xikui; Carey, Michael J.; Tsotras, Vassilis J. (December 2020, IEEE International Conference on Big Data (Big Data))
null (Ed.)
In many Big Data applications today, information needs to be actively shared between systems managed by different organizations. To enable sharing Big Data at scale, developers would have to create dedicated server programs and glue together multiple Big Data systems for scalability. Developing and managing such glued data sharing services requires a significant amount of work from developers. In our prior work, we developed a Big Active Data (BAD) system for enabling Big Data subscriptions and analytics with millions of subscribers. Based on that, we introduce a new mechanism for enabling the sharing of Big Data at scale declaratively so that developers can easily create and provide data sharing services using declarative statements and can benefit from an underlying scalable infrastructure. We show our implementation on top of the BAD system, explain the data sharing data flow among multiple systems, and present a prototype system with experimental results.
more » « less
Full Text Available
Bridging BAD Islands: Declarative Data Sharing at Scale

Wang, Xikui; Carey, Michael J.; Tsotras, Vassilis J. (January 2020, IEEE BigData)
null (Ed.)
Full Text Available
An IDEA: an i ngestion framework for d ata e nrichment in a sterixDB

https://doi.org/10.14778/3342263.3342628

Wang, Xikui; Carey, Michael J. (July 2019, Proceedings of the VLDB Endowment)

Full Text Available
BAD to the bone: Big Active Data at its core

https://doi.org/10.1007/s00778-020-00616-7

Jacobs, Steven; Wang, Xikui; Carey, Michael J.; Tsotras, Vassilis J.; Uddin, Md Yusuf (January 2020, The VLDB Journal)

Full Text Available

Search for: All records