Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Today, data is being actively generated by a variety of devices, services, and applications. Such data is important not only for the information that it contains, but also for its relationships to other data and to interested users. Most existing Big Data systems focus onpassivelyanswering queries from users, rather thanactivelycollecting data, processing it, and serving it to users. To satisfy both passive and active requests at scale, application developers need either to heavily customize an existing passive Big Data system or to glue one together with systems likeStreaming EnginesandPub-sub services. Either choice requires significant effort and incurs additional overhead. In this paper, we present the BAD (Big Active Data) system as an end-to-end, out-of-the-box solution for this challenge. It is designed to preserve the merits of passive Big Data systems and introduces new features for actively serving Big Data to users at scale. We show the design and implementation of the BAD system, demonstrate how BAD facilitates providing both passive and active data services, investigate the BAD system’s performance at scale, and illustrate the complexities that would result from instead providing BAD-like services with a “glued” system.more » « less
-
Free, publicly-accessible full text available May 1, 2026
-
Free, publicly-accessible full text available May 27, 2025
-
Free, publicly-accessible full text available May 24, 2025
-
This paper studies the spatial group-by query over complex polygons. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within the boundaries of each polygon. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query on highly skewed spatial data. We also propose an effective query optimizer that adaptively assigns the appropriate processing scheme based on the query polygons. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques.more » « less
-
Query optimization is the process of finding an efficient query execution plan for a given SQL query. The runtime difference between a good and a bad plan can be tremendous. For example, in the case of TPC-H query 5, a query with 5 joins, the difference between the best and the worst plan is more than 10,000×. Therefore, it is vital to avoid bad plans. The dominating factor which differentiates a good from a bad plan is their join order and whether this join order avoids large intermediate results.more » « less